BLEOMYCIN GENE CLUSTER COMPONENTS AND THEIR USES 

CROSS-REFERENCE TO RELATED APPLICATIONS 

This application claims benefit under 35 U.S.C. §1 19 of provisional 
applications USSN 60/1 15,435, filed on January 6, 1999, and USSN 60/1 18,848, filed on 
5 February 5, 1999, both of which are herein incorporated by reference in their entirety for all 
purposes. 

STATEMENT AS TO RIGHTS TO INVENTIONS MADE UNDER FEDERALLY 
SPONSORED RESEARCH AND DEVELOPMENT 

This work was supported in part by an Institutional Research Grant from the 
10 American Cancer Society and the School of Medicien, University of California, Davis, 

National Institutes of Health Grant Number A140475, and a grant from the Searle Scholars 
Program of the Chicago Community Trust. The Government of the United States of 
America may have certain rights in this invention. 

FIELD OF THE INVENTION 

15 This invention relates the field of polyketide synthesis and nonribosomal 

polypeptide synthesis. In particular this invention pertains to the isolation of the bleomycin 
gene cluster which encodes the first identified hybrid polyketide synthase/nonribosomal 
peptide synthetase pathway. 

BACKGROUND OF THE INVENTION 

20 Polyketides and nonribosomal peptides are two large families of natural 

products that include many clinically valuable drugs, such as erythromycin and vancomycin 
(antibacterial), FK506 and cyclosporin (immunosuppresant), and epothilone and bleomycin 
(BLM) (antitumor). The biosyntheses of polyketides and nonribosomal peptides are 
catalyzed by polyketide synthases (PKSs) (Hopwood (1997) Chem. Rev. 97: 2465; Katz 

25 (1997) Chem. Rev., 97: 2557; C. Khosla, (1997) Chem. Rev., 97: 2577; Ikeda and Omura, 
(1997) Chem. Rev., 97: 2591; Staunton and Wilkinson(1997) Chem. Rev., 97: 261 1; Cane et 
al (1998) Science 282: 63) and nonribosomal peptide synthetases (NRPSs) (Cane et 
a/.(1998) Science 282: 63. Marahiel et al (1997) Chem. Rev. 97: 2651; von Dohren et al. 
(1997) Chem. Rev. 97: 2675), respectively. Remarkably, PKSs and NRPSs use a very 

-1- 



t • 



similar strategy for the assembly of these two distinct classes of natural products by 

sequential condensation of short carboxylic acids and amino acids, respectively, and utilize 

the same 4'-phosphopantetheine prosthetic group, via a thioester linkage, to channel the 

growing polyketide or peptide intermediate during the elongation processes. 

5 Both type I PKSs and NRPSs are multifunctional proteins that are organized 

into modules. (A module is defined as a set of distinctive domains that encode all the 

enzyme activities necessary for one cycle of polyketide or peptide chain elongation and 

associated modifications.) The number and order of modules and the type of domains within 

a module on each PKS or NRPS protein determine the structural variations of the resulting 

10 polyketide and peptide products by dictating the number, order, choice of the carboxylic acid 

or amino acid to be incorporated, and the modifications associated with a particular cycle of 

elongation. These features of PKS and NRPS inspired us to search for a hybrid PKS and 

r 3 NRPS system. Since the modular architecture of both PKS (Cane et a/.(1998) Science 282: 

5 63; Katz and Danadio (1993) Ann. Rev. Microbiol 47: 875 (1993); Hutchinson and Fujii 

* : 4 15 (1995) Ann. Rev. Microbiol. 49: 201) and NRPS (Cane et a/.(1998) Science 282: 63, 

j Stachelhaus et al (1995) Science 269: 69; Stachelhaus et al (198) Mol Gen. Genet. 257: 

!;• J 308; Belshaw et al. (1999) Science 284, 486) has been exploited successfully in 

combinatorial biosynthesis of diverse "unnatural" natural products, it is imagined that a 

; 5 'T hybrid PKS and NRPS system, capable of incorporating both carboxylic acids and amino 

! s :f 20 acids into the final products, could surely lead to even greater chemical structural diversity. 

Q The BLMs, differing structurally at the C-terminal amines of the 

' SB * glycopeptides, are a family of antibiotics produced by Streptomyces verticillus (Sv). BLMs 

exhibit strong antitumor activity through a metal-dependent oxidative cleavage of DNA or 

RNA in the presence of molecular oxygen and are incorporated into current chemotherapy of 

25 several malignancies under the trade name of Blenoxane® that contains BLM A2 and BLM 

B2 as the principal constituents (Sikic et al Eds. (1985) Bleomycin Chemotherapy, 

Academic Press, New York; Natrajan and Hecht (1994) pages 197-242 In: Molecular 

Aspects of Anticancer Drug-DNA Interaction Vol 2, Neidle and Waring Eds., Macmillan, 

London). Umezawa, Fujii, Takita, and co-workers extensively studied the biosynthesis of 

30 BLM in Sv ATCC 15003 by feeding isotope-labeled precursors and by isolating various 

biosynthetic intermediates and shunt metabolites, establishing that the BLMs are in fact 

natural hybrid metabolites of polyketide and peptide biosynthesis (Takita and Muroka (1990) 

pages 289-309 In: Biochemistry of Peptide Antibiotics: Recent Advances in the 

Biotechnology of p-Lactams and Microbial Peptides, Kleinkauf and Von Dohren Eds., W. de 
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Gruyter, New York). On the assumption that BLM biosynthesis follows the paradigm for 
peptide and polyketide biosynthesis, we predict that the Blm megasynthetase, which 
catalyzes the assembly of the BLM backbone from nine amino acids and one acetate, should 
bear the characteristics of both NRPS and PKS, providing an excellent model to study the 
5 mechanism by which NRPS and PKS could be integrated into a productive biosynthetic 

system to synthesize a hybrid peptide and polyketide metabolite (Fig. 1A) (Shen et ah (1999) 
Bioorg. Chem. 27: 155). 

SUMMARY OF THE INVENTION 

This invention pertains to the isolation and elucidation of the bleomycin gene 

10 cluster. Nucleic acid sequences encoding all of the open reading frames (ORFs) that encode 
polypeptides sufficient to direct the biosynthesis of bleomycin are provided. The nucleic 
acids can be used in their "native" format or recombined in a wide variety of manners to 
create novel synthetic pathways. 
^jj^ Q^P I n one^mbodiment, this invention provides an isolated nucleic acid 

15 comprising a nucleic aVid selected from the group consisting of a nucleic acid encoding any 
one of Blm open readina frames (ORFs) 8 through 41, and/or a nucleic acid encoding a 
polypeptide encoded by any one of Blm open reading frames (ORFs) 8 through 41, and/or a 
nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer pairs 
identified in Table II and theVucleic acid of a bleomycin-producing organism as a template. 

20 The nucleic acid may comprise* one or multiple (e.g. two, more preferably 3 or more) 

bleomycin open reading framesV/.e. BLM ORFs 8 through 41). One preferred nucleic acid 
comprises a nucleic acid encodinga C domain lacking one or more His residues of the 
conserved HHxxxDG active site fo\ transpeptidation. In another preferred embodiment the 
nucleic acid comprises a nucleic aciavencoding a protein encoded by a gene selected from the 

25 group consisting of blml, blmll, and bunXI. 

In another embodiment this invention provides an isolated nucleic acid 
encoding a (biosynthetic) module comprising two or more (more preferably three or more, 
most preferably four or more) catalytic domains of a protein encoded by a nucleic acid of a 
bleomycin gene cluster wherein said catalytic domains are selected from the group consisting 

30 of a condensation (C) domain, an adenylation (A) domain, a peptidyl carrier protein (PCP) 
domain, a condensation/cyclization domain (Cy), an acyl-carrier protein (ACP)-like domain, 
an oxidization domain (Ox), a ketoacyl synthase (KS) domain , an acetyl transferase (AT) 
domain, a ketoreductase (KR) domain, and a methyltransferase (MT) domain. Preferred 
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nucleic acids comprises a nucleic acid encoding one or more proteins comprising a module 
selected from the group consisting of NRPS-0, NRPS-1, NRPS-2, NRPS-3, NRPS-4, NRPS- 
5, NRPS-6, NRPS-7, NRPS-7, NRPS-9, and PKS. Particularly preferred nucleic acids 
comprise an open reading frame from SEQ ID NO: 1, SEQ ID NO: 2, or SEQ ID NO: 3. 
5 In still another embodiment, this invention provides an isolated nucleic acid 

comprising a nucleic acid encoding a protein encoded by a gene from a BLM gene cluster. 
Preferred nucleic acids encode a protein encoded by a gene selected from the group 
consisting of blml, blmll, and blmXL In another embodiment, preferred nucleic acids 
encode a protein encoded by a gene selected from the group consisting of blmlll, blmlV, 

10 blmV, blmVI, blmVII, blmlX, and blmX. In still yet another embodiment, the nucleic acid 
comprises a nucleic acid encoding a protein encoded by blmVIIL Particularly preferred 
nucleic acids comprise a nucleic acid selected from the group consisting of blml, blmll, and 
blmXL Other particularly preferred nucleic acids comprise a nucleic acid selected from the 
group consisting of blmlll, blmlV, blmV, blmVI, blmVII, blmlX, and blmX, while still other 

15 particularly preferred nucleic acids comprise blmVIIL 

In still yet another embodiment, this invention provides an isolated nucleic 
acid comprising a nucleic acid that encodes a protein comprising at least one catalytic 
domain selected from the group consisting of a condensation (C) domain, an adenylation (A) 
domain, a peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an 

20 acyl-carrier protein (ACP)-like domain, an oxidization domain (Ox), a ketoacyl synthase 
(KS) domain , an acetyl transferase (AT) domain, a ketoreductase (KR) domain, and a 
methyltransferase (MT) domain, and that hybridizes to a nucleic acid selected from the group 
consisting of orf8, orf9, orflO, orfl 1, orfl2, orfl3, orfl4, orfl5, orfl5, orfl6, orfl7, orfl8, 
orfl9, orf20, orf21, orf22, orf23, orf24, orf25, orf26, orf27, orf28, orf29, orOO, orf31, orf32, 

25 orf33, orf34, orf35, orf36, orD7, orf38, orf39, and orf40 under stringent conditions. In 
certain embodiments this also includes nucleic acids that would stringently hybridizes 
indicated above, but for, the degeneracy of the nucleic acid code. In other words, if silent 
mutations could be made in the subject sequence so that it hybridizes to he indicated 
sequence(s) under stringent conditions, it would be included in certain embodiments. A 

30 preferred isolated nucleic acid comprises a nucleic acid encoding a module. A particularly 
preferred isolated nucleic acid comprises a nucleic acid encoding a BLM gene. 

This invention also provides a nucleic acid comprising a nucleic acid selected 
from the group consisting of consisting of orf8, orf9, orflO, orfl 1, orfl2, orfl3, orfl4, orflS, 
orfl5, orfl 6, orfl 7, orfl 8, orfl9, orf20, orf21, orf22, orf23, orf24, orf25, orf26, orf27, orf28, 



orf29, orOO, orf31, orf32, orf33, orf34 5 orf35, orf36, orf37, orf38, orf39, and orf40, or an 
allelic variant thereof. Preferred nucleic acids comprise a nucleic acid that is a single 
nucleotide polymorphism (SNP) of a nucleic acid selected from the group consisting of 
consisting of orf8, orf9, orflO, orfl 1, orfl2, orfl3, orfl4, orflS, orfl5, orfl6, orfl7, orfl8, 
5 orfl9, orf20, or£21, orf22 5 or£23, orf24, orf25 5 orf26, orf27, orf28, orf29, orflO, orOl, orf32, 
orf33, orf34, orD5 3 orf36, orf37, orf38, orf39 5 and orf40. 

This invention also provides an isolated gene cluster comprising open reading 
frames encoding polypeptides sufficient to direct the assembly of a bleomycin. 

In one embodiment this invention provides an isolated multi-functional 
10 protein complex comprising both a polyketide synthase (PKS) and a polypeptide synthetase 
(NRPS) and/or an isolated nucleic acid encoding a multi-functional protein complex 
comprising both a polyketide synthase (PKS) and a polypeptide synthetase (NRPS). 

This invention also provides various blm cluster polypeptides or blm cluster- 
s % derived polypeptides. Thus, in one embodiment this invention provides an isolated 

^1 1 5 polypeptide comprising a catalytic domain encoded by a nucleic acid of a bleomycin gene 
=.p cluster wherein said nucleic acid comprises a nucleic acid selected from the group consisting 

"SI of a nucleic acid encoding any one of Blm open reading frames (ORFs) 8 through 41; and/or 

a nucleic acid amplified by polymerase chain reaction (PCR) using any one of the primer 
pairs identified in Table II. Preferred polypeptides comprise an enzymatic domain selected 
^ 20 from the group consisting of a condensation (C) domain, an adenylation (A) domain, a 
Q peptidyl carrier protein (PCP) domain, a condensation/cyclization domain (Cy), an acyl- 

carrier protein (ACP)-like domain, an oxidization domain (Ox), a ketoacyl synthase (KS) 
domain , an acetyl transferase (AT) domain, a ketoreductase (KR) domain, and a 
methyltransferase (MT) domain. Particularly preferred polypeptides are encoded by the 
25 nucleic acids described above and herein. 

This invention also provides expression vectors comprising any of the nucleic 
acids described herein and/or host cells (e.g. Strep tomyces) transfected and/or transformed 
with any of these expression vectors. A preferred host cell is transformed with an exogenous 
nucleic acid comprising a gene cluster encoding polypeptides sufficient to direct the 
30 assembly of a bleomycin or bleomycin analog. 

This invention also provides methods of use of the blm and 6/m-derived 
nucleic acid(s) and/or polypeptides. One such method is a method of chemically modifying 
a biological molecule. The method involves contacting a biological molecule that is a 
substrate for a polypeptide encoded by one or more bleomycin biosynthesis gene cluster 
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open reading frames with the polypeptide encoded by one or more bleomycin biosynthesis 
gene cluster open reading frames, whereby the polypeptide chemically modifies the 
biological molecule. In one particularly preferred embodiment, the biological molecule is an 
amino acid and said polypeptide is a peptide synthetase. In another preferred embodiment, 
5 the polypeptide is a methyl transferase. Other substrates and blm encoded polypeptides are 
illustrated in Table II. 

In another embodiment this invention provides a method of coupling a first 
amino acid to a second amino acid. This method involves contacting the first and second 
amino acid with a recombinantly expressed bleomycin nonribosomal peptide synthetase 

10 (NRPS). A preferred NRPS is selected from the group consisting of NRPS-5, NRPS-4, 
NRPS-3, NRPS-9, NRPS-8, and NRPS-7. Another preferred NRPS is selected from the 
group consisting of NRPS-6, NRPS-2, NRPS-1, and NRPS-0. The contacting can be in vivo 
{e.g. in a host cell) or ex vivo. 

In another embodiment this invention provides a methods of coupling a first 

15 fatty acid to a second fatty acid, said method comprising contacting the first and second fatty 
acids with a recombinantly expressed bleomycin polyketide synthase (PKS). Again, the 
contacting can be in vivo (e.g. in a host cell) or ex vivo. 

In still another embodiment, this invention provides a method of producing a 
bleomycin or bleomycin analog. The method involves providing a cell transformed with an 

20 exogenous nucleic acid comprising a bleomycin gene cluster encoding polypeptides 

sufficient to direct the assembly of said bleomycin or bleomycin analog; culturing the cell 
under conditions permitting the biosynthesis of bleomycin or bleomycin analog; and 
isolating said bleomycin or bleomycin analog from said cell. 

This invention also provides an isolated nucleic acid comprising a nucleic 

25 acid encoding a phosphopantetheinyl transferase said nucleic acid encoding a 

phosphopantetheinyl transferase being selected from the group consisting of: a nucleic acid 
encoding the protein encoded by the nucleic acid of SEQ ID NO:3; a nucleic acid amplified 
by polymerase chain reaction (PCR) using primers that specifically amplify ORF 41 
(primers: SEQ ID NO:71 and SEQ ID NO:72) and Streptomyces nucleic acid as a template; a 

30 nucleic acid encoding a polypeptide having phosphopantetheinyl transferase activity where 
said nucleic acid specifically hybridizes to the nucleic acid of SEQ ID NO: 3 under stringent 
conditions. In one embodiment, the nucleic acid comprises the nucleic acid of SEQ ID 
NO:3. 
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In another embodiment, this invention provides a polypeptide comprising a 
phosphopantetheinyl transferase encoded by SEQ ID NO: 3 or a polypeptide having 
phosphopantetheinyl transferase activity and the sequence encoded by the nucleic acid of 
SEQ ID NO: 3 or conservative substitutions of that polypeptide. 
5 Also provided are vectors comprising a nucleic acid encoding a 

phosphopantetheinyl transferase (e.g., as described above) and cells transfected with the 
vector. 

This invention also provides a method of converting an apo carrier protein to 
a holo carrier protein, said method comprising reacting said apo-carrier protein with a 
10 recombinant phosphopantetheinyl transferase encoded by SEQ ID NO: 3 and coenzyme A 
thereby producing a holo-carrier protein. 

In certain embodiments, this invention specifically excludes one or more of 
open reading frames 1 through 41. In particularly preferred embodiments, this invention 
excludes open reading frames 1 through 7 (Orf 1- Orf 7). 

15 DEFINITIONS 

The "polyketide synthases" (PKSs) refers are multifunctional enzymes, 
related to fatty acid synthases (FASs). PKSs catalyze the biosynthesis of polyketides through 
repeated (decarboxylative) Claisen condensations between acylthioesters, usually acetyl, 
propionyl, malonyl or methylmalonyl. Following each condensation, they typically 

20 introduce structural variability into the product by catalyzing all, part, or none of a reductive 
cycle comprising a ketoreduction, dehydration, and enoylreduction on the p-keto group of 
the growing polyketide chain. PKSs incorporate enormous structural diversity into their 
products, in addition to varying the condensation cycle, by controlling the overall chain 
length, choice of primer and extender units and, particularly in the case of aromatic 

25 polyketides, regiospecific cyclizations of the nascent polyketide chain. After the carbon 
chain has grown to a length characteristic of each specific product, it is typically released 
from the synthase by thiolysis or acyltransfer. Thus, PKSs consist of families of enzymes 
which work together to produce a given polyketide. Two general classes of PKSs exist. One 
class, known as Type I PKSs, is represented by the PKSs for macrolides such as 

30 erythromycin. These "complex" or "modular" PKSs include assemblies of several large 

multifunctional proteins carrying, between them, a set of separate active sites for each step of 
carbon chain assembly and modification (Cortes et al. (1990) Nature 348: 176; Donadio et 
al. (1991) Science 252: 675; MacNeil et al (1992) Gene 115:119). Structural diversity 
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occurs in this class from variations in the number and type of active sites in the PKSs. This 
class of PKSs displays a one-to-one correlation between the number and clustering of active 
sites in the primary sequence of the PKS and the structure of the polyketide backbone. The 
second class of PKSs, called Type II PKSs, is represented by the synthases for aromatic 
compounds. Type II PKSs typically have a single set of iteratively used active sites (Bibb et 
al (1989) EMBOJ. 8: 2727; Sherman et al (1989( EMBOJ. 8: 2717; Fernandez-Moreno, 
etal (1992) J. Biol Chem. 267:19278). 

A "nonribosomal peptide synthase" (NRPS) refers to an enzymatic complex 
of eucaryotic or procaryotic origin, that is responsible for the synthesis of peptides by a 
nonribosomal mechanism, often known as thiotemplate synthesis (Kleinkauf and von 
Doehren (1987) Ann. Rev. Microbiol, 41: 259-289). Such peptides, which can be up to 20 or 
more amino acids in length, can have a linear, cyclic (cyclosporine, tyrocidine, 
mycobacilline, surfactin and others) or branched cyclic structure (polymyxin, bacitracin and 
others) and often contain amino acids not present in proteins or modified amino acids 
through methylation or epimerization. 

A "module" refers to a set of distinctive polypeptide domains that encode all 
the enzyme activities necessary for one cycle of polyketide or peptide chain elongation and 
associated modifications. 

The terms "isolated" "purified" or "biologically pure" refer to material which 
is substantially or essentially free from components which normally accompany it as found 
in its native state. With respect to nucleic acids and/or polypeptides the term can refer to 
nucleic acids or polypeptides that are no longer flanked by the sequences typically flanking 
them in nature. 

The terms "polypeptide", "peptide" and "protein" are used interchangeably 
herein to refer to a polymer of amino acid residues. The terms apply to amino acid polymers 
in which one or more amino acid residue is an artificial chemical analogue of a 
corresponding naturally occurring amino acid, as well as to naturally occurring amino acid 
polymers. The term also includes variants on the traditional peptide linkage joining the 
amino acids making up the polypeptide. 

The terms "nucleic acid" or "oligonucleotide" or grammatical equivalents 

herein refer to at least two nucleotides covalently linked together. A nucleic acid of the 

present invention is preferably single-stranded or double stranded and will generally contain 

phosphodiester bonds, although in some cases, as outlined below, nucleic acid analogs are 

included that may have alternate backbones, comprising, for example, phosphoramide 
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(Beaucage et al. (1993) Tetrahedron 49(10): 1925) and references therein; Letsinger (1970) 

Org. Chem. 35:3800; Sprinzl et al. (1977) Eur. J. Biochem. 81: 579; Letsinger et al. (1986) 
Nucl Acids Res. 14: 3487; Sawai et aL (1984) Chem. Lett. 805, Letsinger et aL (1988) J. Am. 
Chem. Soc. 110: 4470; and Pauwels et aL (1986) Chemica Scripta 26: 1419), 
5 phosphorothioate (Mag et al. (1991) Nucleic Acids Res. 19: 1437; and U.S. Patent No. 
5,644,048), phosphorodithioate (Briu et aL (1989) J. Am. Chem. Soc. 1 1 1 :2321, O- 
methylphophoroamidite linkages (see Eckstein, Oligonucleotides and Analogues: A 
Practical Approach, Oxford University Press), and peptide nucleic acid backbones and 
linkages (see Egholm (1992) J. Am. Chem. Soc. 1 14:1895; Meier et aL (1992) Chem. Int. Ed. 

10 EngL 31: 1008; Nielsen (1993) Nature, 365: 566; Carlsson et aL (1996) Nature 380: 207). 
Other analog nucleic acids include those with positive backbones (Denpcy et al. (1995) 
Proc. Natl. Acad. Sci. USA 92: 6097; non-ionic backbones (U.S. Patent Nos. 5,386,023, 
5,637,684, 5,602,240, 5,216,141 and 4,469,863; Angew. (1991) Chem. Intl. Ed. English 30: 
423; Letsinger et al. (1988) J. Am. Chem. Soc. 1 10:4470; Letsinger et aL (1994) Nucleoside 

15 & Nucleotide 13: 1597; Chapters 2 and 3, ASC Symposium Series 580, "Carbohydrate 

Modifications in Antisense Research", Ed. Y.S. Sanghui and P. Dan Cook; Mesmaeker et al. 
(1994), Bioorganic & Medicinal Chem. Lett. 4: 395; Jeffs et aL (1994) J. Biomolecular NMR 
34:17; Tetrahedron Lett. Z1\1A1> (1996)) and non-ribose backbones, including those 
described in U.S. Patent Nos. 5,235,033 and 5,034,506, and Chapters 6 and 7, ASC 

20 Symposium Series 580, Carbohydrate Modifications in Antisense Research, Ed. Y.S. 

Sanghui and P. Dan Cook. Nucleic acids containing one or more carbocyclic sugars are also 
included within the definition of nucleic acids (see Jenkins et al. (1995), Chem. Soc. Rev. 
pp 169- 176). Several nucleic acid analogs are described in Rawls, C & E News June 2, 1997 
page 35. These modifications of the ribose-phosphate backbone may be done to facilitate the 

25 addition of additional moieties such as labels, or to increase the stability and half-life of such 
molecules in physiological environments. 

The term "heterologous" as it relates to nucleic acid sequences such as coding 
sequences and control sequences, denotes sequences that are not normally associated with a 
region of a recombinant construct, and/or are not normally associated with a particular cell. 

30 Thus, a "heterologous" region of a nucleic acid construct is an identifiable segment of 
nucleic acid within or attached to another nucleic acid molecule that is not found in 
association with the other molecule in nature. For example, a heterologous region of a 
construct could include a coding sequence flanked by sequences not found in association 
with the coding sequence in nature. Another example of a heterologous coding sequence is a 
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construct where the coding sequence itself is not found in nature (e.g., synthetic sequences 

having codons different from the native gene). Similarly, a host cell transformed with a 

construct which is not normally present in the host cell would be considered heterologous for 

purposes of this invention. 

5 A "coding sequence" or a sequence which "encodes" a particular polypeptide 

(e.g. a PKS, an NRPS, etc.), is a nucleic acid sequence which is ultimately transcribed and/or 

translated into that polypeptide in vitro and/or in vivo when placed under the control of 

appropriate regulatory sequences. In certain embodiments, the boundaries of the coding 

sequence are determined by a start codon at the 5* (amino) terminus and a translation stop 

10 codon at the 3 ! (carboxy) terminus. A coding sequence can include, but is not limited to, 

cDNA from procaryotic or eucaryotic mRNA, genomic DNA sequences from procaryotic or 

eucaryotic DNA, and even synthetic DNA sequences. In preferred embodiments, a 

,«l transcription termination sequence will usually be located 3' to the coding sequence. 

^ Expression "control sequences" refers collectively to promoter sequences, 

: *4 15 ribosome binding sites, polyadenylation signals, transcription termination sequences, 

iM upstream regulatory domains, enhancers, and the like, which collectively provide for the 

;;="; transcription and translation of a coding sequence in a host cell. Not all of these control 

i; sequences need always be present in a recombinant vector so long as the desired gene is 

u, capable of being transcribed and translated. 

20 "Recombination" refers to the reassortment of sections of DNA or RNA 

□ sequences between two DNA or RNA molecules. "Homologous recombination" occurs 

between two DNA molecules which hybridize by virtue of homologous or complementary 

nucleotide sequences present in each DNA molecule. 

The terms "stringent conditions" or "hybridization under stringent conditions" 

25 refers to conditions under which a probe will hybridize preferentially to its target 

subsequence, and to a lesser extent to, or not at all to, other sequences. "Stringent 

hybridization" and "stringent hybridization wash conditions" in the context of nucleic acid 

hybridization experiments such as Southern and northern hybridizations are sequence 

dependent, and are different under different environmental parameters. An extensive guide 

30 to the hybridization of nucleic acids is found in Tijssen (1993) Laboratory Techniques in 

Biochemistry and Molecular Biology—Hybridization with Nucleic Acid Probes part I chapter 

2 Overview of principles of hybridization and the strategy of nucleic acid probe assays, 

Elsevier, New York. Generally, highly stringent hybridization and wash conditions are 

selected to be about 5°C lower than the thermal melting point (T m ) for the specific sequence 
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at a defined ionic strength and pH. The T m is the temperature (under defined ionic strength 
and pH) at which 50% of the target sequence hybridizes to a perfectly matched probe. Very 
stringent conditions are selected to be equal to the T m for a particular probe. 

An example of stringent hybridization conditions for hybridization of 
5 complementary nucleic acids which have more than 100 complementary residues on a filter 
in a Southern or northern blot is 50% formamide with 1 mg of heparin at 42°C, with the 
hybridization being carried out overnight. An example of highly stringent wash conditions is 
0.15 M NaCl at 72°C for about 15 minutes. An example of stringent wash conditions is a 
0.2x SSC wash at 65°C for 15 minutes (see, Sambrook et al (1989) Molecular Cloning - A 

10 Laboratory Manual (2nd ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor 
Press, NY, for a description of SSC buffer). Often, a high stringency wash is preceded by a 
low stringency wash to remove background probe signal. An example medium stringency 
wash for a duplex of, e.g., more than 100 nucleotides, is lx SSC at 45°C for 15 minutes. An 
example low stringency wash for a duplex of, e.g., more than 100 nucleotides, is 4-6x SSC at 

15 40°C for 15 minutes. In general, a signal to noise ratio of 2x (or higher) than that observed 
for an unrelated probe in the particular hybridization assay indicates detection of a specific 
hybridization. Nucleic acids which do not hybridize to each other under stringent conditions 
are still substantially identical if the polypeptides which they encode are substantially 
identical. This occurs, e.g., when a copy of a nucleic acid is created using the maximum 

20 codon degeneracy permitted by the genetic code. 

A "library" or "combinatorial library" of polyketides and/or polypeptides is 
intended to mean a collection of polyketides and/or polypeptides (or other molecules) 
catalytically produced by a PKS and/or NRPS and/or hybrid PKS/NRPS (or other possible 
combination of synthetic elements) gene cluster. The library can be produced by a gene 

25 cluster that contains any combination of native, homolog or mutant genes from aromatic, 
modular or fungal PKSs and/or NRPSs. The combination of genes can be derived from a 
single PKS and/or NRPS gene cluster, e.g., act,fren, gra, tern, whiE, gris, ery, or the like, 
and may optionally include genes encoding tailoring enzymes which are capable of 
catalyzing the further modification of a polypeptide, polyketide, or other molecule. 

30 Alternatively, the combination of genes can be rationally or stochastically derived from an 
assortment of NRPS and/or PKS gene clusters. The library of polyketides and/or 
polypeptides and/or other molecules thus produced can be tested or screened for biological, 
pharmacological or other activity. 
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By "random assortment" is intended any combination and/or order of genes, 
homologs or mutants which encode for the various PKS and/or NRPS enzymes, modules, 
active sites or portions thereof derived from aromatic, modular or fungal PKS and/or NRPS 
gene clusters. 

5 By "genetically engineered host cell" is meant a host cell where the native 

PKS and/or NRPS gene cluster has been altered or deleted using recombinant DNA 
techniques or a host cell into which a heterologous PKS and/or NRPS and/or hybrid 
PKS/NRPS gene cluster has been inserted. Thus, the term would not encompass mutational 
events occurring in nature. A "host cell" is a cell derived from a procaryotic microorganism 
10 or a eucaryotic cell line cultured as a unicellular entity, which can be, or has been, used as a 
recipient for recombinant vectors bearing the PKS, NRPS, and/or hybrid gene clusters of the 
invention. The term includes the progeny of the original cell which has been transfected. It 
p is understood that the progeny of a single parental cell may not necessarily be completely 

% identical in morphology or in genomic or total DNA complement to the original parent, due 

H 15 to accidental or deliberate mutation. Progeny of the parental cell which are sufficiently 
i,0 similar to the parent to be characterized by the relevant property, such as the presence of a 

j:H nucleotide sequence encoding a desired PKS, are included in the definition, and are covered 

^ by the above terms. 

ul Expression vectors are defined herein as nucleic acid sequences that are direct 

!'~ 20 the transcription of cloned copies of genes/cDNAs and/or the translation of their mRNAs in 

□ an appropriate host. Such vectors can be used to express genes or cDNAs in a variety of 

hosts such as bacteria, bluegreen algae, plant cells, insect cells and animal cells. Expression 
vectors include, but are not limited to, cloning vectors, modified cloning vectors, specifically 
designed plasmids or viruses. Specifically designed vectors allow the shuttling of DNA 
25 between hosts, such as bacteria-yeast or bacteria-animal cells. An appropriately constructed 
expression vector preferably contains: an origin of replication for autonomous replication in 
a host cell, a selectable marker, optionally one or more restriction enzyme sites, optionally 
one or more constitutive or inducible promoters. In preferred embodiments, an expression 
vector is a replicable DNA construct in which a DNA sequence encoding a one or more PKS 
30 and/or NRPS domains and/or modules is operably linked to suitable control sequences 

capable of effecting the expression of the products of these synthase and/or synthetases in a 
suitable host. Control sequences include a transcriptional promoter, an optional operator 
sequence to control transcription and sequences which control the termination of 
transcription and translation, and so forth. 
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A "bleomycin open reading frame", or "bleomycin ORF", or "BLM Ovf" 
refers to a nucleic acid open reading frame that encodes a polypeptide or polypeptide domain 
that has an enzymatic activity used in the biosynthesis of a bleomycin. 

A "PKS/NRPS/PKS" system refers to a synthetic system comprising an NRPS 
5 flanked by two PKSs. A "NRPS/PKS/NRPS" system refers to a synthetic system comprising 
a PKS flanked by two NRPSs. A "hybrid PKS/NRPS system" or a "hybrid NRPS/PKS 
system" refers to a hybrid synthetic system comprising at least one PKS and one NRPS 
module. The system can comprise multiple modules and the order can vary. 

A "biological molecule that is a substrate for a polypeptide encoded by a 

10 bleomycin biosynthesis gene" refers to a molecule that is chemically modified by one or 
more polypeptides enccoded by open reading frame(s) of the blm gene cluster. The 
"substrate" may be a native molecule that typically participates in the biosynthesis of a 
bleomycin, or can be any other molecule that can be similarly acted upon by the polypeptide. 

A "polymorphism" is a variation in the DNA sequence of some members of a 

15 species. A polymorphism is thus said to be "allelic," in that, due to the existence of the 
polymorphism, some members of a species may have the unmutated sequence (i.e. the 
original "allele") whereas other members may have a mutated sequence (i.e. the variant or 
mutant "allele"). In the simplest case, only one mutated sequence may exist, and the 
polymorphism is said to be diallelic. In the case of diallelic diploid organisms, three 

20 genotypes are possible. They can be homozygous for one allele, homozygous for the other 
allele or heterozygous. In the case of diallelic haploid organisms, they can have one allele or 
the other, thus only two genotypes are possible. The occurrence of alternative mutations can 
give rise to trialleleic, etc. polymorphisms. An allele may be referred to by the nucleotide(s) 
that comprise the mutation. 

25 "Single nucleotide polymorphism" or "SNPs are defined by their 

characteristic attributes. A central attribute of such a polymorphism is that it contains a 
polymorphic site, "X," most preferably occupied by a single nucleotide, which is the site of 
the polymorphism's variation (Goelet and Knapp U.S. patent application Ser. No. 
08/145,145). Methods of identifying SNPs are well known to those of skill in the art (see, 

30 e.g., U.S. Patent 5,952,174). 

The following abbreviations are used herein:: A, adenylation; ACP, acyl 

carrier protein; AT, acyltransferase; BLM, bleomycin; C, condensation; Cy, 

condensation/cyclization; KR, ketoreductase; KS, ketoacyl synthase; MT, methyltransferase; 

NRPS, nonribosomal peptide synthetase; orf, open reading frame; Ox, oxidation; PCP, 
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peptidyl carrier protein; PCR, polymerase chain reaction; PKS, polyketide synthase; Sv, 
Streptomyces verticillus, ArCP, aryl carrier protein, bp, base pair, CoA, co-enzyme A, DTT, 
dithiothreitol; FAS, fatty acid synthase; kb, kilobase; PPTase, 4'-phosphopantetheinyl 
transferase; TCA, trichloroacetic acid; and DEBS, 6-deoxyerythronolide B synthase.. 

5 BRIEF DESCRIPTION OF THE DRAWINGS 

Figures 1 A and IB illustrate the biosynthetic pathway for bleomycin in Sv 
(ATCC 15003). Figure 1A illustrates a biosynthetic pathway for BLM in Sv ATCC15003- 
intermediates except those in brackets were identified. Figure IB shows a linear model for 
the Blm megasynthetase-templated assembly of the BLM peptide/polyketide/peptide 

10 aglycone from nine amino acids and one acetate-shaded circles represent atypical domains 
carrying out the proposed novel chemistry, and arrows with broken line indicate where 
biosynthetic intermediates were derailed. Three-letter amino acid designations were used. 
[HO], hydroxy lation; [H], reduction. 

Figure 2 provides a restriction map and gene organization of the blm gene 

15 cluster from Sv ATCC 15003 (B, BamHl). Proposed functions for individual open reading 
frames are summarized in Tables I and II. Modules for individual NRPS and PKS were 
given along with their proposed substrates in parentheses. 

Figures 3A, 3B, 3C, and 3D illustrate the determination of substrate 
specificity for NRPS-1 and NRPS-6. Figure 3 A shows a comparison of the A3 to A6 region 

20 of A domains to 84 NRPS modules available at GenBank that activate various amino acids. 
Figure 3B shows a comparison of amino acid residues that putatively line the substrate 
binding pockets for A domains (single-letter amino acid designations were used). The 
number following the protein name indicates the order of a particular A domain in the 
multimodular NRPS protein. The protein accession numbers are P48663 (HMWP2), PI 9828 

25 (AngR), AAC06346 (BacA-2), CAB03756 (MbtB), 3510629 (SyrE-7), 31 14612 (AcmB-1), 
CAA67248 (SnbC-1), and 3560507 (FxbC-2). Dhb stands for 2,3-dehydroaminobutyric 
acid. It is not known if Dhb is the direct substrate for SyrE-7 or resulted from dehydration of 
an SyrE-7 activated Thr (Guenzi et al (1998) J. Biol Chem. 273: 32857-32863). Figure 3C 
illustrates purified proteins after overexpression in E, coli as analyzed by electrophoresis on 

30 a 10% SDS-polyacrylamide gel (the calculated molecular weights for NRPS-1 A and NRPS- 
6A are 64,212 and 61,899, respectively). Figure 3D illustrates substrate specificities as 
determined by the ATP-PPi exchange reaction with the amino acids of BLM as substrates 
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(100% relative activity corresponds to 103,000 cpm forNRPS-lA and 256,000 cpm for 
NRPS-6A). 

Figure 4 illustrates a three-module NRPS/PKS/NRPS model for channeling 
the growing intermediate between NRPS and PKS modules and between PKS and NRPS 
5 modules. The KS, ACP, and C domains are shaded to emphasize their unique activities that 
are responsible for elongating a growing peptide with a short carboxylic acid and a growing 
polyketide with an amino acid in hybrid peptide/polyketide/peptide biosynthesis. 

Figure 5 illustrates the use of 6/raKZZ/methyltransferase domain to introduce 
branched methyl groups in a polyketide synthesis. PCK12 has been described by Kao et al 
10 (1995) 7. Am. Chem. Soc, 7: 9105-9106. DE-1, DE-2 and DE-3 rae three representative 
products demonstrating the strategy and utility of blmVIII in introducing a CH 3 group in 
polyketide biosynthesis. 

Figure 6 illustrates the use of the blm NRPS and PKS enzymes to synthesize a 
variety of hybrid polyketide/peptide molecules including, but not limited to, a family of 
oxazolines/oxazoles, and thiazolme/thiazoles. 

Figure 7 illustrates the use of elements of the blm gene cluster to synthesize 

various sugars. 

faj^S Figure 8A^hows a. restriction map of the blm gene cluster from Sv 
ATCC 15003 (B, BamRl). Vb shows the relative position of the blml, blmll, and blmXI 
20 genes to the two blmAB resistance genes (blm R , Blm resistance). Individual open reading 
frames are represented by open arrows. Figure 8C shows the nucleotide sequence of the 
blml gene. The potential ribosoW-binding site (RBS) and the conserved motif for 4'- 
phosphopantetheinylation are underlined. The sequence has been deposited into GenBank 

under accession no. 

25 Figure 9 shows an arrtono acid sequence comparison of Blml with PCP 

domains of known type I NRPSs (Grs\2 [P14688], 36% identity, 58% similarity; Srfa-3 
[Q08787], 40% identity, 64% similaritV Vir-s [Yl 1547], 36% identity, 60% similarity; Saf- 
b [U24657], 40% identity, 54% similarity). Given in brackets are nucleotide sequence 
accession numbers. The shaded letters indicate similar amino acids. Consensus residues are 
30 amino acids that are similar in more than three sequences. The signature motif for 4'- 
phosphopantetheinylation is underlined. \ 

Figures 10A and 10B shows the HPLC analysis of Blml purified from E. coli 
OG7001(pBS2)(Fig. 10A), and£. coli OG7001(pBS2/pDPT-Gsp) (Fig. 10B). 
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Figure 1 1 shiws the enzyme architecture of type I and type II PKS and 
NRPS. A, adenylation domain; ACP, acyl carrier protein or ACP domain; AT, acyl 
transferase; C, condensation protein or C domain; KS, p-ketoacyl synthase domain; KSa, 0- 
ketoacyl synthase a subipit; KSp, p-ketoacyl synthase p subunit; PCP, peptidyl carrier 
5 protein or PCP domain. 

Figure 12 illustrates the reaction catalyzed by phosphopantetheinyl 
transferases (PPTases). 

Figure 1 3 shows a restriction map and gene organization of the pptA locus 
from SvATCC 15003 

1 0 DETAILED DESCRIPTION 

Polyketides and polypeptides can be assembled in a remarkably similar 
manner by repetitive addition of an extending unit to a growing chain by polyketide 
synthases (PKS) and nonribosomal peptide synthetase (NRPS) respectively. In the case of 
polyketides, the extending unit is typically a fatty acid (activated as an acyl CoA thioester) 

15 while the extending unit for polypeptides is typically an amino acid (activated as an 
aminonacyl adenylate). Both the PKS and NRPS systems have evolved a modular 
organization to define the number, sequence, and specificity of the incorporation of the 
extending unit and utilized the 4 f -phosphopanththeine prosthetic group to channel the 
growing intermediate during the elongation process. 

20 This invention pertains to the discovery that a PKS-bound growing polyketide 

intermediate could be further elongated by an NRPS module, or conversely, a NRPS-bound 
growing polypeptide intermediate can be further elongated by a PKS module. This 
discovery permits the exploitation of NPRS, PKS, and hybrid NRPS/PKS systems to provide 
a number of novel hybrid peptide/polyketide metabolites from amino acids and short fatty 

25 acids. 

It was also a discovery of this invention that this hybrid NRPS/PKS/NRPS 

system is exemplified by the bleomycin (Blm) biosynthesis pathway in Streptomyces 

verticillus (Sv.) (ATCC 15003). The bleomycins are a family of glycopeptide-derived 

antibiotics originally isolated by Umezawa in 1996 from the fermentation broth of S. 

30 verticillus. Bleomycins (BLMs) exhibit strong anti-tumor activity are currently used in the 

treatment of lymphoma, particularly Hodgkin's disease, testicular tumors, squamous cell 

carcinomas of skin, head, cervix, penis, rectum, and for intracavitary therapy of malignant 

effusions in ovarian and breast cancer. The commercial product, Blenoxane®, contains 
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BLM A2 and B2 as the principle constituents. Almost uniquely among anticancer drugs, 
BLM does not cause myelosuppression, promoting its wide application in combination 
chemotherapy. 

In one aspect, this invention provides a cloned and characterized BLM gene 
5 cluster consisting of characteristic NRPS and PKS genes from the Blm producer 

Streptoveticillum sp. (ATCC 15003). The cloned and isolated Blm gene cluster provides a 
method of recombinantly expressing bleomycin and/or bleomycin analogues. Thus, in one 
embodiment, this invention provides for nucleic acids encoding bleomycin synthetic 
machinery or subunits thereof, for cells recombinantly modified to express a bleomycin 
10 and/or bleomycin analogue, and for a bleomycin or bleomycinh analogue recombinantly 
expressed in such cells. 

Like other polyketide synthase or nonribosomal peptide synthetases, the 
m bleomycin synthetic pathway is organized into modules, each module catalyzing the addition 

'•M and/or modification of one subunit (e.g. fatty acid or amino acid). Each module is organized 

-V 15 into a number of domains each domain having a characteristic activity (e.g. activation, 
2 condensation, condensation/cyclization, etc.). The catalytic domains within a module and 

£R the modules themselves are often arranged collinearly and the order of biosynthetic modules 

'J* from NH 2 - to COOH-terminus on each PKS and NRPS polypeptide and the number and type 

! s *f of catalytic domains within each determine the order of structural and functional elements in 

□ 20 the resulting product. The size and complexity of the ultimately formed product are 

^ controlled by the number of repeated acyl chain extension steps that are, in turn, a function 

O of the number and placement of carrier protein domains in these multimodular enzymes. 

The number composition and order of such domains can be altered either to introduce 
modifications, e.g. into the bleomycin to produce bleomycin analogues, or to produce 
25 different or completely new molecules. Such "recombination" is not restricted solely to 

recombination among the bleomycin catalytic domains and/or modules, but can also involve 
recombination between beomycin modules and/or subunits and other PKS and/or NRPS 
modules and/or subunit. Moreover the discovery that synthetic pathways can incorporate 
both PKS and NRPS modules and/or catalytic domains makes available hybrid PKS/NRPS 
30 syntheses. 

Thus, in one embodiment this invention contemplates the use of blm gene 
cluster modules and/or catalytic domains to make various peptide and/or polyketide, and/or 
hybrid polypeptide/polyketide metabolites (including, but not limited to bleomycin 
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intermediates or shunt metabolites), in combinatorial biosynthesis with other polyketide 
synthases and/or other nonribosomal peptide synthetases. 

The blm gene cluster contains several glycosylases which can be used alone 
or in context with other PKS and/or NRPS modules or catalytic domains to make various 
5 metabolites with sugars associated with bleomycins (bleomycin sugars). 

In addition, the blm gene cluster includes a novel methyltransferase domain 
that can be used to make polyketide metabolites with methyl branch(s). 

The blm gene cluster also is characterized by the unusual Cy domains as well 
as the unprecedented Ox domain {see, e.g. BlmlV and Blmlll NRPSs), providing an efficient 
10 biosynthesis for a bithiazole structure. The blm gene cluster, blm modules, or blm catalytic 
domains can be used either individually or collectively (alone or in combinations with other 
nonribosomal peptide synthetases or polyketide synthases) to make thiazolidine, thiazoline 
!B «, and thiazole, bi-thiazolidine, bithiazoline, and bithiazole-containing microbioal metabolites. 

l M Other uses include, but are not limited to the usage of the blm gene 

%j 15 cluster/modules/catalytic units (either individually or collectively) or the Blm model to make 
Z: heterocyclic ring-containing microbioal metabolites, such as five member S- and N- 

CH containing compounds of the thiazolidine, thiazoline and thiazole family or the O- and N- 

„~ containing compounds of the oxazolidine, oxazoline, and oxazole family or to make sugars, 

; :J such L-sugars (with the BlmG epimerase), sugars modified by carbamoyl group (with 

□ 20 BlmD), and disaccharides. 

m This invention also includes the discovery of a novel discrete PCP protein 

O (encoded by the Blml gene). Apo-Blml can be efficiently modified into holo-Blml either in 

vivo or in vitro by PCP-specific 4'-phosphopantetheine transferases (PPTases) such as Gsp 
and Sfp. Unlike the PCP domains in type I NRPSs, blml lacks its cognate A domain and can 
25 be aminoacylated by Val-A, an A domain from a completely unrelated type I NRPS. Blml, 
therefore, represents the first characterized bype II PCP, providing the genetic and 
biochemical evidence to support the existence of a bype II NRPS. The latter system is 
useful, in a manner analogous to the type I NRPS, i.e., modular NRPS, in the combinatorial 
manipulation of NRPS proteins to generate novel peptides. This invention also includes the 
30 discovery and characterizaton of a novel PPTase (encoded by the pptA gene in Figure 13). 
This PPTase can be used in engineered biosynthesis of polyketides, peptides, hybrid peptide 
and polyketide metabolites, hybrid polyketide and peptide metabolites, or the combination of 
both types of metabolites. The PPTase can also be used in converting apo-peptidyl carrier 
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proteins (both type I and type II) and acyl carrier proteins (both type I and type II) into the 
holo-proteins. 

The Examples provided herein and the accompanying primers permit one of 
ordinary skill in the art to isolate the blm gene cluster of this invention, its constituent ORFs, 
various modules, or enzymatic domains. The isolated nucleic acid components can be used 
to express one or more polypeptide components for in vivo (e.g. recombinant) synthesis of 
one or more polypeptides and/or polyketides as indicated above. It will also be appreciated 
that the blm cluster polypeptides can be used for ex vivo assembly of various 
macromolecules. 

I. BLM gene cluster and the PPTase gene, 

A) The BLM gene cluster. 

The nucleic afcids comprising the blm gene cluster are identified in Tables I 
and II and listed in the sequence listing provided herein (SEQ ID NOS: 1 and 2, GenBank 
Accession numbers AT- 149091, AT-2 10249, AF21031 1). In particular, Table I identifies 
genes and functions of open reading frames (ORFs) responsible for the biosynthesis of the 
hybrid peptide/polyketide/peraide backbone and sugar moieties of bleomycin, while Table II 
identifies a number of ORFs comprising the blm gene cluster, identifies the activity of the 
catalytic domain encoded by tl^e ORF and provides primers for the amplification and 
isolation of that orf. 

As illustrated in Example 1, the blm cluster comprises a PKS module, flanked 
by several NRPS modules along with several sugar biosynthesis genes and genes encoding 
other biosynthesis enzymes as well as several resistance and regulatory genes (Table 1). 



Table I. Determined functions of ORFs in the bleomycin biosynthesis gene cluster 



Gene 


Amino 
acids 


Sequence Etomolog 1 


- 

Proposed function ' 


or/8 


424 


YqeR (BAAlWl) 


Oxidase 


blmC 


498 


RfaE(AA079(U.l) 


NDP-glucose synthase 


blml 


90 


GrsB (PI 4688)1 


Type II PCP 


blmD 


545 


NodU (Q53515\\ 


Carbamoyl transferase 


blmE 


390 


RfaF (AAD 16056) 


Glycosyl transferase 


orfl3 


187 


MbtH (005821 1 


Unknown 


blmll 


462 


Nrp (CAA98931) 


NRPS condensation enzyme 


orfl5 


339 


SyrP (1890776) I 


Regulation 


blmll 


935 


HMWP2 (P48633), McbC 
(P23185) I 


A PCP Ox 
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Mm TV 


ZDZO 


rllVlVVFZ ^rHODjjJ 


C A PPP Pv A PCP fv 


orjio 


63 s 


/Vplilj \£.Z/yj 1 Oj ) 


/Aapala^lIlC by I1LI1CLo.dC 


blmF 


494 


RfbC (Q50864)/BlmOrfl 
(567319) 


Glycosyl transferase/p-hydroxylase 


blmG 


325 


Yt^B (2293288) 


Sugar epimerase 


blmV 


645 


McVB (2708278) 


PCP C 


blmVI 


2675 


ACoAS (1658531), PksD 

\2> 1 jylH) 

SnbDE (CAA67249) 


A 4 ACP C A PCP C A 


blmVIl 


1218 


SyrEY3510629) 


C A PCP 


blmVIII 


1841 


HMW1 (CAA73127) 


KS AT MT KR ACP 


blmlX 


1066 


SafB (1^71128) 


C A PCP 


blmX 


2140 


TycC (2fe23773) 


C A PCP C A PCP 


blmXI 


688 


SyrE (35M)629) 


NRPS condensation enzyme 


orf28 


239 


SC9C7.04C (CAA22716) 


Unknown 


orf29 


582 


YvdB (CAB08068) 


Transmembrane transporter 


or/30 


113 


SmtB (P3034P) 


Regulation 


or/31 


117 


PhnA (P16680\ 


Unknown 



1. Protein accession numbers are given in parentheses. 2. Underlined domains contain motifs that are clearly 
different from known NRPS or PKS domains. 3. This A domain lacks the typical NRPS Al, A2, A4, A8, and 
A9 motifs and more closely resembles acyl Co A synthases. ORF1 to ORF7 were reported by Schmidt (1994) 
Gene 151:17-21, who assigned ORF2 as blmA and ORF4 as blmB. 



5 

Noteworthy are the genes encoding the NRPS and PKS enzymes. The blml, 
blmll, and blmXI genes encode NRPSs with an unusual architecture. In contrast to all known 
NRPSs, which are of modular organization with each module consisting minimally of a 
condensation (C), an adenylation (A), and a peptidyl carrier protein (PCP) domain, Blml, 

10 Blmll and BlmXI are discrete proteins homologous to individual domains of type I NRPSs. 
' We have characterized Blml as a type II PCP (Du and Shen (1999) Chem. Biol. 6: 507-517). 
The Blmll and BlmXI proteins can serve as candidates for type II condensation enzymes. 

The blmlll blmIV t blmV, blmVI blmVIl, blmlX, and blmX genes encode 
modular NRPSs consisting of domains characteristic for known type I NRPSs, such as the A, 

15 PCP, C, and condensation/cyclization (Cy) domains, as well as an unprecedented oxidation 
(Ox) domain. BlmVI is unique among all the Blm NRPSs identified. Its N-terminal module 
(NRPS-5) consists of an atypical A domain, which bears a close resemblance to a family of 
acyl CoA synthases (Fitzmaurice and Kolattukudy (1997) J. BacterioL 179: 2608-2615; 
Fitzmaurice and Kolattukudy (1998) J. Biol Chem. 273: 8033-8039), and an acyl carrier 

20 protein (ACP)-like domain. Its C-terminal module is truncated and presumably interacts 

with BlmV to constitute the complete NRPS-3 module (Fig. IB). Also noteworthy are the C 

domain of NRPS-3 that lacks both His residues of the conserved HHxxxDG (SEQ ID NO: 4) 

active site for transpeptidation (Stachelhaus et al (1998) J. Biol Chem. 273: 22773-22781) 

20 
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and the extra C domain at the C-terminus of BlmV. These unusual features associated with 
BlmVI and BlmV may play roles in the formation of the p-aminoalaninamide and the 
pyrimidine moieties of BLM, which are unprecedented in peptide biosynthesis. 

The blm VIII gene encodes a PKS module consisting of domains characteristic 
for known PKSs, such as ketoacyl synthase (KS), acyltransferase (AT), ketoreductase (KR), 
and ACP, with malonyl CoA acting as an extending unit according to sequence comparison 
of the AT domain (Haydock et al (1995) FEBS Lett. 374: 246-248) (Fig. IB). 

The identification of an integrated methyltransferase (MT) domain in the 
middle of Blm VIII is unique, representing the first PKS from actinomycetes that contains an 
internal MT domain. 



Table II. Blm gene cluster oroen reading frames (ORFs) and primers for ORF amplification. 



Orf# 


Position 


Activity 
\ 


Method 


Primers 

Forward 
Reverse 


Se 

q 

ID 

No 


orf-8 


76183- 
77457 


Oxygen-independent 
coproporphyrinogen 
III oxidase \ 


Gapped-blast 
comparison 1 


F: ATGAGCCACGCCATCGGA 
R • TCAGGCGCGTTCGGGGGC 


5 
6 


orf-9 


74690- 
76186 


ADP-heptose synthase 
(blmQ \ 


Gapped-blast 
comparison 1 


F : GTGAACACCGACCTGCCC 
R : TCATGGGGTGTC1 CCC I C 


7 

o 
o 


orf- 
10 


74421- 
74693 


Peptidyl carrier \ 
protein \ 
(blml) \ 


Expression and 

biochemical 

characterization. 2 


F: ATGAGCGCCCCGCGGGGC 
R: TCACCGGTCCCGCTCCCC 


9 

10 


orf- 
11 


72787- 
74424 


Carbamyltransferase \ 
(blmD) > 


Gapped-blast 
comparison 1 


F: ATGAGCGCCGACCCGTCC 
R : TCATGAGCGGGCCGCCGT 


11 
12 


? orf- 
12 


71618- 
72790 


ADP-heptose:LPS 
heptosyl transferase 
(blmE) 


\Gapped-blast 
comparison 1 


F: ATGACCACCCCCATGACC 
R : TCATGGGGTACTCCTGAT 


13 
14 


orf- 
13 


70983- 
71546 


Homolog of mbtH in 
the synthesis of 
mycobactin 


Gapped-blast 
comparison 1 


F: ATGACCACGACCCCGCGG 
R : TCAGGTGCCGGACACGCG 


15 
16 


orf- 
14 


69598- 
70986 


Peptide synthetase 
(condensation, blmll) 


Gapped-blast 
comparison 1 


F: GTGACCGCCCCCGGCACA 
R : TCATCGGTGGCTCCTCGT 


17 
18 


orf- 
15 


68582- 
69601 


Regulatory gene 
(homolog of syrF) 


Gappectblast 
comparison 1 


F: GTGAACCGGCACGGCCCC 
R : TCACGCGCTCACCTCGTC 


19 
20 


orf- 
16 


65778- 
68585 


Mutated peptide 
synthetase- oxidase 
(NRPS-0, blmlll) 


Gapped-blast 
comparison 1 


F: GTGACGAGCGCCCGGCCC 
R : TCACGGGGCCTCCGTGCG 


21 
22 


orf- 
17 


57901- 
65781 


Peptide synthetase 
(NRPS-2-l,Wm/F) 


Expression knd 

biochemical^ 

characterizatfpn. 2 


F: ATGCTGCACGGCGCCGCG 
R : TCACTCCGGTCCACCTCC 


23 
24 
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orf- 
18 


55899- 
57815 


Asparagin© synthetase 


Gapped-blast 
comparison 1 


F : GTGAGGC CCGTGTGCGGC 
R : TCAGCCACCGTTGCCGCC 


25 
26 


orf- 
19 


54418- 
55902 


Homolog of 
hvdroxvlasev 
dehydrogenase (blmF) 


Gapped-blast 
comparison 1 


F : GTGAAGGACCTCGGCCGG 
R : TCACTCCCCCGGTGCCGG 


27 
28 


orf- 
20 


53427- 
54404 


Nucleotide-sugar 
epimerase \ 
(blmG) \ 


Gapped-blast 
comparison 1 


F : GTGACATGGAC CGTGGTG 
R : TCAGGCATCGGCCCTCCC 


29 
30 


orf- 
21 


51493- 
53430 


Peptide synthetase 
(NRPS-3CT, MmV) 


vjappeu-uiaSi 
comparison 1 


F • ATGCGCGGGCATGACGAC 
R : TCACGGTGTCTCTCCCTC 


3 1 
32 


orf- 
22 


43263- 
51290 


Peptide synthetase 
(NRPS-5-4-3,)b//wW) 


Expression and 

biochemical 

characterization. 2 


F: ATGAGCCGGCCGGCCGGC 

K. : ± L-/\HjjV— 1 L.Vjvjj J. LbLL 


33 


orf- 
23 


39610- 
43266 


Peptide synthetase 
(NRPS-6, blmWl) 


Expression and 

biochemical 

characterization. 2 


F • GTGACCACGCCCCGCATC 
R : TCATTCGGGACGCGGGCA 


3 5 
36 


orf- 
24 


34088- 
39613 


Polyketide synthase 
(blmVIII) \ 


Gapped-blast 
comparison 1 


F: ATGAGCCATGCCGACGCG 
R : TCACAGCACCACCTCTTC 


37 
38 


orf- 
25 


30891- 
34091 


Peptide synthetase 
(NRPS-7, blmlX) \ 


Gapped-blast 
comparison 1 


F : ATGACCCCGGCCGCCGAC 
R : TCATCGTCCGCCGCCTTT 


39 
40 


orf- 
26 


24406- 
30894 


Peptide synthetase\ 
(NRPS-9-8, blmX) \ 


Gapped-blast 
comparison 1 


F: ATGCCTCGGTGTGCCCGA 
R : TCATTCGGCGGCACCTCC 


41 
42 


orf- 
27 


22127- 
24193 


Peptide synthetase \ 
(condensation, blmJoT) 


Gapped-blast 
comparison 1 


F : GTGGGTTTCCGTCGAGCG 
R : TTACACCCTCCGTTTCTC 


43 
44 


orf- 
28 


21367- 
22086 


Phosphatidylserine \ 
decarboxylase \ 


Gapped-blast 
comparison 1 


F : ATGGCACAGGACCTGAAC 
R : TCAACGCCACCGGATCTT 


45 
46 


orf- 
29 


19161- 
20909 


Transmembrane \ 
transporter \ 


Gapped-blast 
comparison 1 


F : GTGAGCTCCCTCGCCGTC 
R: TCATCGTCGGGCACTCGG 


47 
48 


orf- 
30 


18823- 
19164 


Metal dependent \ 
ret?ulatorv element \ 


Gapped-blast 
comparison 1 


F : GTGCCGGTTCCGCTGTAT 
R : TCACCGGGCACTGACCTC 


49 
50 


orf- 
31 


18660- 
18307 


PHNA homolog \ 


Gapped-blast 
comparison 1 


F : GTGACCGAGAACCTTCCG 
R: TCAGACCTTCTTGACCAC 


51 
52 


orf- 
32 


17736- 
9211 


Peptide synthetase 
(NRPS-11-10) 


i Gapped-blast 
\ comparison 1 


F : ATGGCCTCAGACGCTTTG 
R : TCATTGAGACTCCTCCTC 


53 
54 


orf- 
33 


9214- 
7859 


Putative transoorter 


\Gapped-blast 
comparison 1 


F : ATGATGAAGTCAAGCCGC 
R : TCAGTGGCTTACAAGGAG 


55 
56 


orf- 
34 


7797- 
6784 


Homolog of 
clavaminic acid 
synthase 


gapped-blast 
comparison 1 


F : ATGACTGACCTGCCGTTG 
R : TCACACCAGCAGCGAGGT 


57 
58 


orf- 
35 


6773- 
6021 


Thioesterase 


G&pped-blast 
comparison 1 


F : ATGGATTTCCCCCTCACC 
R : TCATGCCCCTACCTCGGC 


59 
60 


orf- 
36 


6024- 
4741 


Putative transporter 


Gafeped-blast 
comparison 1 


F: ATGACCGCGCGCGTCGAC 
R : TCACTCCTCGGCTTCGGC 


61 
62 


orf- 
37 


4733- 
3915 


Unknown 


Gapbed-blast 
comparison 1 


F : GTGTCCAAGAACGCGGCG 
R: TCATCGGCTCGCCTCGTG 


63 
64 


orf- 
38 


3918- 
2182 


Peptide synthetase 
(NRPS-12) 


Gapded-blast 
comparison 1 


F: ATGACCCTCACCCTGCGG 
R : TCACTCGGGCACTCCTTC 


65 
66 


orf- 
39 


2185- 
1199 


Regulatory gene 
(homolog of SyrP 


Gapped-blast 
comparison 1 


F : GTGACCGGTTCCGTAACG 
R : TCATGAGTCCGCCGAGGT 


67 
68 


orf- 


1015-1 


Peptide synthetase 


Gapped-blast 


F; ATGACAGAGGTCCGAGGT 


69 
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40 






comparison 1 


R: 


CCCGGCAACCGCCCTCCC 


70 


orf- 


On a 


\ 


Expression and 


F: 


GTGATCGCCGCCCTCCTG 


71 


41 


separate 


phosphopantetmeinyl 


biochemical 


R: 


TTACGGGACGGCGGTCCG 


72 




sequence 


transferase (pphf) 


characterization. 2 









The Blm megasynthetase comprises nine NRPS modules and one PKS 
module forming a hybrid NRPS/PKS/NRPS metasynthetase (Fig. 1 A). Inspection of the blm 
gene cluster (Fig. 2) showed that the Blm NRPS and PKS modules apparently are not 
organized according to the "colinearity rule" for BLM biosynthesis (Fig. 1). Detailed 
functional organization of the megasynthetase and the BLM synthetic pathway is provided in 
Example I. 
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PPTase 



This invention also provides the gene (pptA, Fig. 13) encoding 
phosphopantetheine transferase (PPTase) (GenBank Accession No: AF21031 1) {see, SEQ ID 
NO: 3). PPTase converts carrier proteins for the growing acyl chain from inactive apo-forms 
to functional holo-forms by the covalent attachment of the 4' -phosphopantetheine moiety of 
coenzyme A to a conserved serine residue of the carrier-protein substrate {see, e.g., Fig. 1 A). 

Using the sequence information provided herein {e.g. primer sequences and 
PPTase sequence) the PPTase nucleic acids can be routinely isolated according to standard 
methods {e.g. PCR amplification). Detailed protocols for the isolation of the PPTase are 
provided in Example 3. 

Other PPTases can be identified using the probes and primers illustrated in 
Example 3. Briefly, using a primer to the THC motif (5'-C GGC ATG GTC GGC TCC HTN 
CAN CAY TG -3'( SEQ ID NO: 73) where H= C+A, N = A + C + T + G, Y = C + T,K = G 
+ T,R = A + G,W = T + A), and a primer designed around the typical C-terminal PPTase 
motif {e.g., KEA- 1 : 5'-T GCA GCA GAA CAG GAG GCK NYC CCA NKG - 3*/sEQ ID 
NO: 74) and KEA-2: 5'- TG GGT CAG CGG GTA CCA NRC YTT RWA - 3'f SEQ ID NO: 
7^, and using 5. verticillus chromosomal DNA as template, the set of primers THC/KEA-2 
a probe can be amplified (about 250 bp), that specifically binds to a PPTase. Libraries of 
organisms comprising NRPS, PKS, and/or hybrid PKS/NRPS pathways can be probed for 
the presence of a PPTase sequence. Once hybridizing clones are identified, the PPTase 
sequence can be isolated according to standard methods well know to those of skill in the art 
{see, e.g., Example 3). 
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C) Isolation/preparation of nucleic acids. 

In one embodiment, this invention provides nucleic acids for the recombinant 
expression of a bleomycin. Such nucleic acids include isolated gene cluster(s) comprising 
open reading frames encoding polypeptides sufficient to direct the assembly of a bleomycin. 
5 In other embodiments of this invention, modified bleomycins {e.g. bleomycin 

analogs), novel polyketides, polypeptides, and combinations thereof (polyketide/polypeptide 
hybrids) are created by modifying PKSs and/or NRPSs so as to introduce variations into 
known polymers synthesized by the enzymes. Such variations may be introduced by design, 
for example to modify a known molecule in a specific way, e.g. by replacing a single 

10 monomeric unit within a polymer with another, thereby creating a derivative molecule of 

predicted structure. Alternatively, variations can be made randomly, for example by making 
a library of molecular variants of a known polymer by systematically or haphazardly 
replacing one or more modules or enzymatic domains in a known PKS or NRPS with a 
collection of alternative modules or domains. Production of alternative/modified PKSs, 

15 NRPSs and hybrid systems is described below. 

Using the primer and sequence information provided herein, one of ordinary 
skill in the art can routinely isolate/clone the PKS and/or NRPS modules and/or enzymatic 
domains described herein. For example, the PCR primers provided in Table II, above, can 
be used to amplify any of the orfs identified therein. Moreover, using the sequence 

20 information for the blm gene cluster provided herein, the design of other primers suitable of 
the amplification of individual ORFs, combinations of ORFs, genes, etc. is routine. 

Typically such amplifications will utilize the DNA of an organism containing 
the requisite genes {e.g. Streptomyces verticillus) as a template. Typical amplification 
conditions include a PCR mixture consisting of 5 ng of S verticillus genomic or plasmid 

25 DNA as template, 25 pmoles of ech primers, 25 \x$A dNTP, 5% DMSO, 2 units of Taq 

polymerase, 1 x buffer, with or without 20% glycerol in a final volume of 50 \xL. PCR is 
carried out {e.g. on a Gene Amp PCR System 2400 (Perkin-Elmer/ABI)) with a cycling 
scheme as follows: initial denaturing at 94°C for 5 min, 24-36 cycles of 45 sec at 94°C, 1 
min at 60°C, 2 min at 72°C, followed by additional 7 min at 72°C. One of skill will 

30 appreciate that optimization of such a protocol, e.g. to improve yield, etc. is routine {see, e.g., 
U.S. Patent No. 4,683,202; Innis (1990) PCR Protocols A Guide to Methods and 
Applications Academic Press Inc. San Diego, CA, etc). In addition, primer may be designed 
to introduce restriction sites and so facilitate cloning of the amplified sequence into a vector. 
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Using the information provided herein other approaches to cloning the desired 
sequences will be apparent to those of skill in the art. For example, the PKS or NRPS 
modules or enzymatic domains of interest can be obtained from an organism that expresses 
the same, using recombinant methods, such as by screening cDNA or genomic libraries, 
5 derived from cells expressing the gene, or by deriving the gene from a vector known to 
include the same. The gene can then be isolated and combined with other desired NRPS 
and/or PKS modules or domains, using standard techniques. If the gene in question is 
already present in a suitable expression vector, it can be combined in situ, with, e.g., other 
PKS subunits, as desired. The gene of interest can also be produced synthetically, rather 

10 than cloned. The nucleotide sequence can be designed with the appropriate codons for the 
particular amino acid sequence desired. In general, one will select preferred codons for the 
intended host in which the sequence will be expressed. The complete sequence can be 
assembled from overlapping oligonucleotides prepared by standard methods and assembled 
into a complete coding sequence {see, e.g., Edge (1981) Nature 292:756; Nambair et al. 

15 (1984) Science 223: 1299; Jay et al (1984) Biol. Chem. 259:631 1). In addition, it is noted 
that custom gene synthesis is commercially available {see,, e.g. Operon Technologies, 
Alameda, CA). 

Examples of such techniques and instructions sufficient to direct persons of 
skill through many cloning exercises are found in Berger and Kimmel (1989) Guide to 

20 Molecular Cloning Techniques, Methods in Enzymology 152 Academic Press, Inc., San 

Diego, CA (Berger); Sambrook et al. (1989) Molecular Cloning - A Laboratory Manual (2nd 
ed.) Vol. 1-3, Cold Spring Harbor Laboratory, Cold Spring Harbor Press, NY; Ausubel (19 
1994) Current Protocols in Molecular Biology, Current Protocols, a joint venture between 
Greene Publishing Associates, Inc. and John Wiley & Sons, Inc., U.S. Patent 5,017,478; and 

25 European Patent No. 0,246,864. 

II. Expression of blm gene clusters, modules, and enzymatic domains. 

As indicated above, in one embodiment this invention provides novel NRPS 
and PKS genes for the efficient recombinant production of both novel and known 
polyketides, peptides, and polyketide/polypeptide hybrids by expressing them in vivo. In 
30 other embodiments, such syntheses are carried out in vitro. Even in vitro syntheses, 

however, typically utilize recombinantly expressed PKSs, NRPSs, or enzymatic domains 
thereof. Thus, it is frequently desirable to express protein components of the PKSs or NRPs 
described above. 
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Typically expression of the protein components of the pathway and/or of the 
products of the NRPS/PKS pathway is accomplished by placing the subject PKS or NRPS 
nucleic acid(s) in an expression vector, and transfecting a cell with the vector such that the 
cell expresses the desired product(s). 

5 A) Expression vectors 

The choice of vector depends on the sequence(s) that are to be expressed. 
Any transducible cloning vector can be used as a cloning vector for the nucleic acid 
constructs of this invention. However, where large clusters are to be expressed, it 
phagemids, cosmids, Pis, YACs, BACs, PACs, HACs or similar cloning vectors be used for 

10 cloning the nucleotide sequences into the host cell. Phagemids, cosmids, and BACs, for 
example, are advantageous vectors due to the ability to insert and stably propagate therein 
larger fragments of DNA than in Ml 3 phage and lambda phage, respectively. Phagemids 
which will find use in this method generally include hybrids between plasmids and 
filamentous phage cloning vehicles. Cosmids which will find use in this method generally 

15 include lambda phage-based vectors into which cos sites have been inserted. Recipient pool 
cloning vectors can be any suitable plasmid. The cloning vectors into which pools of 
mutants are inserted may be identical or may be constructed to harbor and express different 
genetic markers (see, e.g., Sambrook et aL, supra). The utility of employing such vectors 
having different marker genes may be exploited to facilitate a determination of successful 

20 transduction. 

In preferred embodiments of this invention, vectors are used to introduce 
PKS, NRPS, or NRPS/PKS genes or gene clusters into host (e.g. Streptomyces) cells. 
Numerous vectors for use in particular host cells are well known to those of skill in the art. 
For example described in Malpartida and Hopwook, (1984) Nature, 309:462-464; Kao et aL, 
25 (1994), Science, 265: 509-512; and Hopwood et ah, (1987) Methods EnzymoL, 153:1 16-166 
all describe vectors for use in various Streptomyces hosts. 

In a preferred embodiment, Streptomyces vectors are used that include 
sequences that allow their introduction and maintenance in E. coli. Such Streptomyces! E. 
coli shuttle vectors have been described (see, for example, Vara et aL, (1989) J. BacterioL, 
30 171:5872-5881; Guilfoile & Hutchinson (1991) Proc. Natl Acad. Sci. USA, 88: 8553-8557.) 

The gene sequences, or fragments thereof, which collectively encode a PKS 
and/or NRPS and/or PKS/NRPS gene cluster, one or more ORFs, one or more modules, or 
one or more enzymatic domains of this invention, can be inserted into one or more 
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expression vectors, using methods known to those of skill in the art. Expression vectors will 
include control sequences operably linked to the desired NRPS and/or PKS coding sequence 
or fragment thereof. Suitable expression systems for use with the present invention include 
systems that function in eucaryotic and prokaryotic host cells. However, as explained above, 
5 prokaryotic systems are preferred, and in particular, systems compatible with Streptomyces 
spp. are of particular interest. Control elements for use in such systems include promoters, 
optionally containing operator sequences, and ribosome binding sites. Particularly useful 
promoters include control sequences derived from PKS and/or NRPS gene clusters, such as 
one or more act promoters. However, other bacterial promoters, such as those derived from 
10 sugar metabolizing enzymes, such as galactose, lactose {lac) and maltose, will also find use 
in the present constructs. Additional examples include promoter sequences derived from 
biosynthetic enzymes such as tryptophan (trp), the beta -lactamase (bla) promoter system, 
p bacteriophage lambda PL, and T5. In addition, synthetic promoters, such as the tac promoter 

h % (U.S. Patent 4,55 1,433), which do not occur in nature also function in bacterial host cells. In 

Nl 15 Streptomyces, numerous promoters have been described including constitutive promoters, 
J such as ermE and tcmG (Shen and Hutchinson, (1994) J. Biol Chem. 269: 30726-30733), as 

% well as controllable promoters such as actl and actHI (Pleper et aL, (1995) Nature, 378: 263- 

^ 266; Piepere/ aL, (1995)7. Am. Chem.Soc, 117: 1 1373-1 1374; and Wiesmann et aL, (1995) 

P Chem. & BioL 2: 583-589). 

20 Other regulatory sequences may also be desirable which allow for regulation 

□ of expression of the PKS replacement sequences relative to the growth of the host cell. 

Regulatory sequences are known to those of skill in the art, and examples include those 
which cause the expression of a gene to be turned on or off in response to a chemical or 
physical stimulus, including the presence of a regulatory compound. Other types of 
25 regulatory elements may also be present in the vector, for example, enhancer sequences. 

Selectable markers can also be included in the recombinant expression 
vectors. A variety of markers are known which are useful in selecting for transformed cell 
lines and generally comprise a gene whose expression confers a selectable phenotype on • 
transformed cells when the cells are grown in an appropriate selective medium. Such 
30 markers include, for example, genes that confer antibiotic resistance or sensitivity to the 
plasmid. Alternatively, several polyketides are naturally colored and this characteristic 
provides a built-in marker for selecting cells successfully transformed by the present 
constructs. 
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The various PKS and/or NRPS clusters or subunits of interest can be cloned 
into one or more recombinant vectors as individual cassettes, with separate control elements, 
or under the control of, e.g., a single promoter. The PKS and/or NRPS subunits can include 
flanking restriction sites to allow for the easy deletion and insertion of other PKS subunits so 
5 that hybrid PKSs can be generated. The design of such unique restriction sites is known to 
those of skill in the art and can be accomplished using the techniques described above, such 
as site-directed mutagenesis and PCR. 

Methods of cloning and expressing large nucleic acids such as gene clusters, 
including PKS- or NRPS-encoding gene clusters, in cells including Streptomyces are well 
10 known to those of skill in the art {see, e.g., Stutzman-Engwall and Hutchinson (1989) Proc. 
Natl. Acad. Sci. USA, 86: 3135-3139; Motamedi and Hutchinson (1987) Proc. Natl Acad. 
Sci. USA, 84: 4445-4449; Grim et al (1994) Gene, 151:1-10; Kao et al. (1994) Science, 
fr *3 265: 509-512; and Hopwood et al, (1987) Meth. EnzymoL, 153: 116-166). In some 

^ examples, nucleic acid sequences of well over lOOkb have been introduced into cells, 

H! 15 including prokaryotic cells, using vector-based methods (see, for example, Osoegawa et al., 
j (1998) Genomics, 52: 1-8; Woon et al, (1998) Genomics, 50: 306-316; Huang et al, (1996) 

;f: ! H Nucl Acids Res., 24: 4202-4209). In addition, the cloning and overexpression of NRPS- 1 

3 and NRPS-6 is illustrated in Example 1 . 

'r^ In certain embodiments this invention may make use of genetically 

I'jf 20 engineered cells that either lack PKS and/or NRPS genes or have their naturally occurring 
p PKS and/or NRPS genes substantially deleted. These host cells can be transformed with 

'' t= * recombinant vectors, encoding a variety of PKS and/or NRPS gene clusters, for the 

production of active polyketides. The invention provides for the production of significant 
quantities of product, e.g. a bleomycin, at an appropriate stage of the growth cycle. The 
25 BLMs or other hybrid polyketide/peptide metabolites so produced can be used as therapeutic 
agents, to treat a number of disorders, depending on the type of metabolites in question. For 
example, several of the polyketides and peptides produced by the present method will find 
use as immunosuppressants, as anti-tumor agents, as well as for the treatment of viral, 
bacterial and parasitic infections. The ability to recombinantly produce polyketides and 
30 peptides also provides a powerful tool for characterizing PKSs and/or NRPSs and the 
mechanism of their actions. 
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B) Host cells. 

The vectors described above can be used to express various protein 
components of the polyketide and/or polypeptide synthetic modules for subsequent isolation 
and/or to provide a biological synthesis of one or more desired biomolecules (e.g 
5 polyketides, peptides, etc.). Where one or more proteins of the blm cluster are expressed 
(e.g. overexpressed) for subsequent isolation and/or characterization, the proteins are 
expressed in any prokaryotic or eukaryotic cell suitable for protein expression. In one 
preferred embodiment, the proteins are expressed in E. coli. Overexpression of blml in E. 
coli is described in Example 2. 
10 Host cells for the recombinant production of the subject polyketides can be 

derived from any organism with the capability of harboring a recombinant PKS, NRPS or 
PKS/NRPS gene cluster. Thus, the host cells of the present invention can be derived from 
=□ either prokaryotic or eucaryotic organisms. However, preferred host cells are those 

j? constructed from the actinomycetes, a class of mycelial bacteria which are abundant 

15 producers of a number of polyketides and peptides. A particularly preferred genus for use 
>M with the present system is Streptomyces. Thus, for example, S. verticillus S. ambofaciens, S. 

:*y avermitilis, S. azureus, S. cinnamonensis, S. coelicolor, S. curacoi, S. erythraeus, S.fradiae, 

S. galilaeus, S. glaucescens, S. hygroscopicus, S. lividans, S. parvulus, S. peucetius, S. 
I?* rimosus, S. roseofulvus, S. thermotolerans, S. violaceoruber, among others, will provide 

j'jfl 20 convenient host cells for the subject invention, with S. coelicolor being preferred (see, e.g., 
i|J Hopwood, D. A. and Sherman, D. H. Ann. Rev. Genet. (1990) 24:37-66; O'Hagan, D. The 

Polyketide Metabolites (Ellis Horwood Limited, 1991), for a description of various 
polyketide-producing organisms and their natural products.) 

In a preferred embodiment, the above-described cells are genetically 
25 engineered by deleting one or more naturally occurring PKS and/or NRPS genes therefrom, 
using standard techniques, such as by homologous recombination, (see, e.g., Khosla, et al. 
(1992) Molec. Microbiol. 6: 3237). 

In certain embodiments, a eukaryotic host cell is preferred (e.g. where certain 
glycosylation patterns are desired). Suitable eukaryotic host cells are well known to those of 
30 skill in the art. Such eukaryotic cells include, but are not limited to yeast cells, insect cells, 
plant cells, fungal cells, and various mammalian cells (e.g. COS, CHO HeLa cells lines and 
various myeloma cell lines) 

29 



C) Protein/polyketide recovery. 

Polypeptide and/or polyketide recovery is accomplished according to standard 
methods well known to those of skill in the art. Thus, for example where blm cluster 
proteins are to be expressed and isolated, the proteins can be expressed with a convenient tag 
to facilitate isolation {e.g. a Hise) tag. Other standard protein purification techniques are 
suitable and well known to those of skill in the art (see, e.g., Quadri et al (1998) 
Biochemistry 37: 1585-1595; Nakano et al (1992) Mol. Gen. Genet. 232: 313-321, etc.). 

Similarly where components (e.g. modules and/or enzymatic domains) of the 
blm cluster are used to express various biomolecules (e.g. polyketides, sugars, polypeptides, 
etc.) the desired product and/or shunt metabolite(s) are isolated according to standard 
methods well know to those of skill in the art (see, e.g., Carreras and Khosla (1998) supra.) 
Purification and in vitro reconstitution of the essential protein components of an aromatic 
polyketide synthase. Biochemistry 37: 2084-2088, Deutscher (1990) Methods in Enzymology 
Volume 182: Guide to Protein Purification, M. Deutscher, ed. . 

HI. Synthesis of recombinant bleomycins. 

In one embodiment this invention provides methods of synthesizing 
bleomycins and recombinantly synthesized bleomycins. As indicated above, this is generally 
accomplished by providing an organism (e.g. a bacterial cell) containing sufficient 
compoents of the blm gene cluster to direct synthesis of a complete bleomycin. 

In one embodiment, the entire blm cluster is cloned into a Streptomyces strain 

(e.g., S. lividans or S. coelicolor). Kao et al.(\994) Science, 265: 509-512, have cloned the 

30 kb DEBS genes from Sacc. erythmea into S. coelicolor and produced 6- 

deoxyerythronolide B in S. coelicolor and these methods can be used construct an expression 

plasmid for heterologous expression of the blm cluster. This method involves the transfer of 

DNA between a temperature-sensitive plasmid and a shuttle vector by means of a 

homologous double recombination event in E. coli (Id.). In a preferred embodiment, the two 

ends spanning the blm cluster are cloned into a temperature-sensitive plasmid that is 

chloramphenicol resistant (CM R ) such as pCK6. S. verticillus DNA is then rescued from a 

donor into the temperature-sensitive recipient by co-transforming E. coli with the Cm R 

recipient plasmid and the apramycin resistant (Ap R ) pKC505 donor cosmid that contains the 

blm gene cluster, followed by chloramphenicol and apramycin selection at 30°C. Colonies 

harboring both plasmids (Cm R , Ap R ) will be shifted to 44°C on chloramphenicol and 

apramycin plates and only those cointegrates formed by a single recombination event 
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between the two plasmids are viable. Surviving colonies are then propagated at 30°C on 
Cm R plates to select for recombinant plasmids formed by the resolution of cointegrates 
through a second recombinant event. The desired blm cluster is cloned into the Cm R 
temperature-sensitive plasmid and is ready to be moved into any expression plasmid by a 
similar means of homologous recombinant event. 

For example, if pWHM861 is the choice of shuttle plasmid for the expression 
of the blm cluster in S. lividans (Meurer and Hutchinson (1995) J. BacterioL, 177: 477-481), 
the two ends spanning the blm cluster downstream of the ErmE* promoter in the ampicillin 
resistant (AM R ) plasmid pWHM861 are cloned. The resulting plasmid is co-transformed 
with the temperature-sensitive plasmid containing the blm cluster described above into E. 
coli under the selection of chloramphenicol and ampicillin at 30°C. These Cm R and AM R 
colonies are shifted to 44°C on chloramphenicol and ampicillin plates to undergo a single 
recombination event and the surviving colonies are resolved on ampicillin plates at 30°C by 
completing the double recombination process. The resulting plasmid is suitable for 
transformation into S. lividans by selection of thiostrepton, in which the expression of the 
desired blm cluster is under the control of the ErmE* promoter. The S. lividans 
transformants are cultured and any metabolites produced are isolated and characterized. 

Once production of BLM in S. lividans is established, mutated alleles of the 
blm synthetase can be introduced into the blm cluster for the production of BLM analogs. 

IV, Altered endogenous expression of bleomycins. 

Using the Blm gene cluster information provided herein, one of skill in the art 
may regulating the synthesis of endogenous bleomycin. The expression of various ORFs 
comprising the blm gene cluster may be increased or decreased to alter bleomycin synthesis 
levels. 

Methods of altering the expression of endogenous genes are well known to 
those of skill in the art. Typically such methods involve altering or replacing all or a portion 
of the regulatory sequences controlling expression of the particular gene that is to be 
regulated. In a preferred embodiment, the regulatory sequences (e.g., the native promoter) 
upstream of one or more of the blm ORFs are altered. 

This is typically accomplished by the use of homologous recombination to 
introduce a heterologous nucleic acid into the native regulatory sequences. To downregulate 
expression of one or more blm ORFs, simple mutations that either alter the reading frame or 
disrupt the promoter are suitable. To upregulate expression of the blm ORF(s) the native 
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promoter(s) can be substituted with heterologous promoter(s) that induce higher than normal 
levels of transcription. 

In a particularly preferred embodiment, nucleic acid sequences comprising the 
structural gene in question or upstream sequences are utilized for targeting heterologous 
recombination constructs. 

The use of homologous recombination to alter expression of endogenous 
genes is described in detail in U.S. Patent 5,272,071, WO 91/09955, WO 93/09222, WO 
96/29411, WO 95/31560, and WO 91/12650. 

V. Synthesis of BLM analogs. 

In one one embodiment, this invention provides methods of synthesizing 
modified bleomycins or bleomycin analogs. In preferred embodiments, the BLM analogs are 
synthesized either by introducing specific perturbations into individual NRPS and/or PKS 
enzymatic domains or modules, or by reprogramming the linear order in which the NRPS or 
PKS enzymatic domains and/or modules appear in the blm synthetase genes. The former 
will lead to BLM analogs with targeted modifications at the BLM backbone and the latter 
will allow incorporation of other extension units in variable sequence into the biosynthesis of 
BLM. In particularly preferred embodiments, the genetically modified blm synthetases are 
produced in S. verticilus, however, it will be recognized that the entire blm gene cluster can 
be cloned into other hosts, e.g. into S. lividans or S. coelicolor. 

In preferred embodiments modification of the blm gene cluster to yield BLM 
analogues is accomplished by one of two different approaches. In one approach, the BLM 
enzymatic domains and/or modules modules are altered in a directed manner {i.e. they are 
changed in a preselected way), while in another approach, random/haphazard alterations are 
introduced into the blm cluster and the resulting products are screened to identify those with 
desired properties. 

A) Synthesis of BLM analogs by specific engineering of the blm synthetase 
genes. 

The blm synthetase genes can be re-engineered by means of specific 
mutations or by reprogramming the linear order of the NRPS or PKS enzymatic domains or 
modules. In this approach, a wild-type blm synthetase allele is replaced with these mutants 
in and expressed in an appropriate host {e.g., S. verticillus or in a heterologous host). Since 
both NRPSs (Stachelhaus et al. (1995) Science, 269: 69-72) and PKSs (Donadio et al. (1993) 
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Proc. Natl Acad. Sci. USA, 90: 71 19-7123, Donadio et al (1995) J. Am., Chem. Soc, 117: 
9105-9106, Cortes et al (1995) Science, 268: 1487-1489) have shown considerable tolerance 
to reprogramming, it is expected that these modifications of the BLM synthetase will result 
in the production of BLM analogs with predicted structural alterations. For example, 
5 targeted modification at the (2S,3S,4R)-4-amino-3-hydroxy-2-methyl/pentanoic acid AHM 
moiety of BLM can be accomplished by introduction of mutations into the BLMVIII PKS 
module of the BLM synthetase locus. Inactivation of the MT or KR motif by in- frame 
deletion or site-directed mutagenesis will result in the production of BLM analogs containing 
a demethyl-AHM, oxo-AHM, or oxo-demethyl-AHM moiety, etc. 

10 Alternatively, individual functional NRPS domains and/or the PKS module 

can be deleted or the PKS module can be duplicated in-frame to produce BLM analogs with 
shorter or longer backbone, respectively. Alternatively, or in addition, the NRPS domains or 
the PKS module can be rearranged for the production of BLM analogs with a completely 
different backbone. The NRPS and PKS features can be combined into one integrated 

15 system, providing access to a structural variation not available by either the NRPS or PKS 
system alone. 

To create such mutations, plasmids are constructed carrying in-frame 
deletions of DNA segments encompassing a portion of the blm synthetase activities. 
Construction of specific deletions is preferably accomplished by one of the following two 

20 strategies. The first involves subcloning of a DNA fragment in a gene replacement vector, 
selection of two restriction sites suitably located at the two ends of the DNA segments, and 
deletion of this segment from within the plasmid by rejoining the two resulting ends. An in- 
frame deletion can be obtained by a suitable combination of Klenow filling and SI treatment 
of both ends prior to ligation. 

25 The second approach involves polymerase chain reaction (PCR) amplification 

of two DNA segments that separate the region to be deleted followed by joining of the two 
fragments in the correct orientation in a gene replacement vector. This can be accomplished 
by designing PCR primers with suitable restriction sites. The restriction site used to generate 
the deletion and the sequences to serve as templates for the PCR amplification are chosen so 

30 as to generate two segments of blm synthetase DNA of approximately equal length in the 

construction in order to maximize the chance of gene replacement. The gene replacement 

vector containing the allelic or deletion mutation is introduced into a Streptomyces strain 

{e.g., S. verticillus). Integration of the plasmid into the S. verticillus chromosome via a 

single reciprocal homologous recombination will yield a recombinant that will be isolated by 
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selection for the vector marker. The resulting integrants are then grown under non-selective 
conditions and further resolution by selection for the loss of the vector marker via the second 
homologous recombination event will produce the desired deletion mutants. 

Southern analysis of the isolated deletion mutants with the target DNA is 
performed to ensure that the expected double crossover recombination event has taken place. 
The first approach is convenient if there are suitably spaced restriction sites in the DNA 
sequence. The second approach enables the deletion of any DNA segment but may be 
limited by the size of the DNA segments that can be amplified by PCR. These S. verticillus 
recombinants are cultured under typical conditions for BLM production and the fermentation 
broth is screened for the production of any novel BLM analogs resulted from the specific 
mutations in the blm synthetase locus. 

B) Synthesis of BLM analogs by "random" modification oiblm synthetase 
genes* 

Bleomycin analogs can also be synthesized by randomly/haphazardly altering 
genes in the BLM cluster expressing the products of the randomly modified megasynthetase 
and then screening the products for the desired activity. Methods of "randomly" altering blm 
cluster genes are described below. 

VI. Generation of other synthetic systems. 

In addition to the production of bleomycin or modified bleomycins, the blm 
gene cluster or elements thereof can be used by themselves or in combination with NRPS 
and/or PKS modules and/or enzymatic domains of other PKS and/or NRPS systems to 
produce a wide variety of compounds including, but not limited to various polyketides, 
polypeptides, polyketide/polypeptide hybrids, various oxazoles and thiazoles, various sugars, 
various methylated polypeptides/polyketides, and the like. As with the production of 
modified bleomycins described above, such compounds can be produced, in vivo or in vitro, 
by catalytic biosynthesis using large, modular PKSs, NRPSs, and hybrid PKS/NRPS 
systems. The megasynthetases directing such syntheses can be rationally designed e.g. by 
predetermined alteration/modification of polyketide and/or polypeptide and/or hybrid 
PKS/NRPS pathways. Alternatively, large combinatorial libraries of cells harboring various 
megasynthetases can be produced by the random modification of particular pathways and 
then selected for the production of a molecule or molecules of interest. It will be appreciated 
that, in certain embodiments, such libraries of megasynthetases/modified pathways, can be 
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used to generate large, complex combinatorial libraries of compounds which themselves can 
be screened for a desired activity. 

A) Directed modification of biomolecules. 

Elements (e.g. open reading frames) of the blm biosynthetic gene cluster 
5 and/or variants thereof can be used in a wide variety of "directed" biosynthetic processes (i.e. 
where the process is designed to modify and/or synthesize one or more particular preselected 
metabolite(s)). Polypepitdes encoded by particular open reading frames or combinations of 
open reading frames can be utilized to perform particular chemical modifications of 
biological molecules. 

10 Thus, for example, open reading frames encoding a polypeptide synetase can 

be used to chemically modify an amino acid by coupling it to another amino acid. In another 
example, the methyl transferase in BlmVIII can be utilized to introduce methyl groups into 
polyketides, and other, substrates. The glycosyl transferases can be used to glycosylate 
appropriate substrates, and so forth. These examples, are merely illustrative. One of skill in 

15 the art, utilizing the information provided here, can perform literally countless chemical 

modifications and/or syntheses using either "native" bleomycin biosynthesis metabolites as 
the substrate molecule, or other molecules capable of acting as substrates for the particular 
enzymes in question. Other substrates can be identified by routine screening. Methods of 
screening enzymes for specific activity against particular substrates are well known to those 

20 of skill in the art. 

The biosyntheses can be performed in vivo, e.g. by providing a host cell 
comprising the desired blm gene cluster open reading frame(s) and/or in vivo, e.g., by 
providing the polypeptides encoded by the blm gene cluster ORFs and the appropriate 
substrates and/or cofactors. 

25 B) Directed engineering of novel synthetic pathways. 

In numerous embodiments of this invention, novel polyketides, polypeptides, 
and combinations thereof are created by modifying known PKSs or NRPSs so as to introduce 
variations into known polymers synthesized by the enzymes. Such variations may be 
introduced by design, for example to modify a known molecule in a specific way, e.g. by 
30 replacing a single monomeric unit within a polymer with another, thereby creating a 

derivative molecule of predicted structure. Such variations can also be made by adding one 
or more modules to a known PKS or NRPS, or by removing one or more module from a 
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known PKS or NRPS. Such novel PKSs or NRPSs can readily be made using a variety of 
techniques, including recombinant methods and in vitro synthetic methods. 

Using any of these methods, it is possible to introduce PKS domains into a 
NRPS, or vice versa, thereby creating novel molecules including both peptide and polyketide 
5 structural domains. For example, a PKS enzyme producing a known polyketide can be 
modified so as to include an additional module that adds a peptide moiety into the 
polyketide. Novel molecules synthesized using these methods can be screened, using 
standard methods, for any activity of interest, such as antibiotic activity, effects on the cell 
cycle, effects on the cytoskeleton, etc. 

10 Novel polyketides, polypeptides, or combinations thereof can also be made by 

creating novel PKSs or NRPSs de novo, using recombinant or in vitro synthetic methods. 
Such novel arrangements of domains can be designed, i.e. to create a specific polymer. In 
addition to creating novel PKSs or NRPSs by combining modules, the methods of this 
invention can also be used to make novel modules that can add new monomeric units to a 

15 growing polypeptide or polyketide chain. Because the identity of each module, and, 
consequently, the identity of the monomer added by the module, is determined by the 
identity and number of the functional domains comprising the module, it is possible to 
produce novel monomeric units by creating novel combinations of functional domains within 
a module. Such novel modules can be created by design, for example to make a specific 

20 module that will add a specific monomer to a polyketide or polypeptide, or can be created by 
the random association of domains so as to produce libraries of novel modules. Such novel 
modules can be made using recombinant or in vitro synthetic means. 

Mutations can be made to the native NRPS and/or PKS subunit sequences and 
such mutants used in place of the native sequence, so long as the mutants are able to function 

25 with other PKS and/or PKS subunits to collectively catalyze the synthesis of an identifiable 
polyketide and/or polypeptide. Such mutations can be made to the native sequences using 
conventional techniques such as by preparing synthetic oligonucleotides including the 
mutations and inserting the mutated sequence into the gene encoding a NRPS and/or PKS 
subunit using restriction endonuclease digestion, (see, e.g., Kunkel, (1985) Proc. Natl Acad. 

30 Sci. USA 82: 448; Geisselsoder et al (1987) BioTechniques 5: 786). Alternatively, the 

mutations can be effected using a mismatched primer (generally 10-20 nucleotides in length) 

which hybridizes to the native nucleotide sequence, at a temperature below the melting 

. temperature of the mismatched duplex. The primer can be made specific by keeping primer 

length and base composition within relatively narrow limits and by keeping the mutant base 
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centrally located (Zoller and Smith (1983) Meth, EnzymoL 100: 468). Primer extension is 
effected using DNA polymerase, the product cloned and clones containing the mutated 
DNA, derived by segregation of the primer extended strand, selected. Selection can be 
accomplished using the mutant primer as a hybridization probe. The technique is also 
applicable for generating multiple point mutations (see, e.g., Dalbie-McFarland et al, (1982) 
Proc. Natl. Acad, Sci USA 79:6409). PCR mutagenesis will also find use for effecting the 
desired mutations. 

C) Random modification of PKS/NRPS pathways. 

In another embodiment, variations can be made randomly, for example by 
making a library of molecular variants of a known polymer by randomly mutating one or 
more PKS or NRPS modules and/or enzymatic domains or by randomly replacing one or 
more modules or enzymatic domains in a known PKS or NRPS with a collection of 
alternative modules and/or enzymatic domains.. 

The PKS and/or NRPS modules can be combined into a single multi-modular 
enzyme, thereby dramatically increasing the number of possible combinations obtained using 
these methods. These combinations can be made using standard recombinant or nucleic acid 
amplification methods, for example by shuffling nucleic acid sequences encoding various 
modules or enzymatic domains to create novel arrangements of the sequences, analogous to 
DNA shuffling methods described in Crameri et al, (1998) Nature 391: 288-291, and in U.S. 
Patents 5,605,793 and in 5,837,458. In addition, novel combinations can be made in vitro, 
for example by combinatorial synthetic methods. Novel polymers, or polymer libraries, can 
be screened for any specific activity using standard methods. 

Random mutagenesis of the nucleotide sequences obtained as described above 
can be accomplished by several different techniques known in the art, such as by altering 
sequences within restriction endonuclease sites, inserting an oligonucleotide linker randomly 
into a plasmid, by irradiation with X-rays or ultraviolet light, by incorporating incorrect 
nucleotides during in vitro DNA synthesis, by error-prone PCR mutagenesis, by preparing 
synthetic mutants or by damaging plasmid DNA in vitro with chemicals. Chemical mutagens 
include, for example, sodium bisulfite, nitrous acid, hydroxylamine, agents which damage or 
remove bases thereby preventing normal base-pairing such as hydrazine or formic acid, 
analogues of nucleotide precursors such as nitrosoguanidine, 5-bromouracil, 2-aminopurine, 
or acridine intercalating agents such as proflavine, acriflavine, quinacrine, and the like. 
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Generally, plasmid DNA or DNA fragments are treated with chemicals, transformed into E. 
coli and propagated as a pool or library of mutant plasmids. 

Large populations of random enzyme variants can be constructed in vivo 
using "recombination-enhanced mutagenesis." This method employs two or more pools of, 
for example, 10 6 mutants each of the wild-type encoding nucleotide sequence that are 
generated using any convenient mutagenesis technique, described more fully above, and then 
inserted into cloning vectors. 

D) Incorporation and/or modification of non-blm cluster elements. 

In either the directed or random approaches, nucleic acids encoding novel 
combinations of modules and/or enzymatic are introduced into a cell. In one embodiment, 
nucleic acids encoding one or more PKS or NRPS domains are introduced into a cell so as to 
replace one or more domains of an endogenous PKS or NRPS within a chromosome of the 
cell. Endogenous gene replacement can be accomplished using standard methods, such as 
homologous recombination. Nucleic acids encoding an entire PKS, NRPS, or combination 
thereof can also be introduced into a cell so as to enable the cell to produce the novel 
enzyme, and, consequently, synthesize the novel polymer. In a preferred embodiment, such 
nucleic acids are introduced into the cell optionally along with a number of additional genes, 
together called a 'gene cluster,' that influence the expression of the genes, survival of the 
expressing cells, etc. In a particularly preferred embodiment, such cells do not have any 
other PKS- or NRPS- encoding genes or gene clusters, thereby allowing the straightforward 
isolation of the polymer synthesized by the genes introduced into the cell. 

Furthermore, the recombinant vector(s) can include genes from a single PKS 
and/or NRPS gene cluster, or may comprise hybrid replacement PKS gene clusters with, e.g., 
a gene for one cluster replaced by the corresponding gene from another gene cluster. For 
example, it has been found that ACPs are readily interchangeable among different synthases 
without an effect on product structure. Furthermore, a given KR can recognize and reduce 
polyketide chains of different chain lengths. Accordingly, these genes are freely 
interchangeable in the constructs described herein. Thus, the replacement clusters of the 
present invention can be derived from any combination of PKS and/or NRPS gene sets that 
ultimately function to produce an identifiable polyketide and/or peptide. 

Examples of hybrid replacement clusters include, but are not limited to, 
clusters with genes derived from two or more of the act gene cluster, the whiE gene cluster, 
frenolicin (fren), granaticin (gra\ tetracenomycin {tern), 6-methylsalicylic acid (6-msas), 
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oxytetracycline (otc)> tetracycline (tet), erythromycin (ery), griseusin (gris), nanaomycin, 
medermycin, daunorubicin, tylosin, carbomycin, spiramycin, avermectin, monensin, 
nonactin, curamycin, rifamycin and candicidin synthase gene clusters, among others. (For a 
discussion of various PKSs, see, e.g., Hopwood and Sherman (1990) Ann. Rev. Genet. 24: 
5 37-66; O'Hagan (1991) The Polyketide Metabolites, Ellis Horwood Limited. 

A number of hybrid gene clusters have been constructed, having components 
derived from the act,fren, tcm, gris and gra gene clusters (see, e.g., U.S. Patent 5,712,146). 
Other hybrid gene clusters, as described above, can easily be produced and screened using 
the disclosure herein, for the production of identifiable polyketides, polypeptides or 

10 polyketide/polypeptide hybrids. 

Host cells (e.g. Streptomyces) can be transformed with one or more vectors, 
collectively encoding a functional PKS/NRPS set (e.g. a bleomycin or bleomycin analog), or 
a cocktail comprising a random assortment of PKS and/or NRPS genes, modules, active 
sites, or portions thereof. The vector(s) can include native or hybrid combinations of PKS 

15 and/or NRPS subunits or cocktail components, or mutants thereof. As explained above, the 
gene cluster need not correspond to the complete native gene cluster but need only encode 
the necessary PKS and/or NRPS components to catalyze the production of the desired 
product. For example, in Streptomyces aromatic PKSs, carbon chain assembly requires the 
products of three open reading frames (ORFs). ORF1 encodes a ketosynthase (KS) and an 

20 . acy transferase (AT) active site (KS/AT); ORF2 encodes a chain length determining factor 
(CLF), a protein similar to the ORF1 product but lacking the KS and AT motifs; and ORF3 
encodes a discrete acyl carrier protein (ACP). Some gene clusters also code for a 
ketoreductase (KR) and a cyclase, involved in cyclization of the nascent polyketide 
backbone. However, it has been found that only the KS/AT, CLF, and ACP, need be present 

25 in order to produce an identifiable polyketide. Thus, in the case of aromatic PKSs derived 
from Streptomyces, these three genes, without the other components of the native clusters, 
can be included in one or more recombinant vectors, to constitute a "minimal" replacement 
PKS gene cluster. 

E) Variation of starter and extender units. 

30 In addition to varying the PKS and/or NRPS modules and/or domains, 

variations in the products produced by various PKS/NRPS systems can be obtained by 
varying the starter units and/or the extender units. Thus, for example, a considerable degree 
of variability exists for starter units, e.g., acetyl CoA, maloamyl CoA, propionyl CoA, 
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acetate, butyrate, isobutyrate and the like. In addition, naturally occurring PKSs and/or 
NRPSs have shown some tolerance for varying extender units. 

F) Examples of preferred modifications. 

As indicated above, the novel PKS and NRPS modules and enzymatic 
domains identified herein can be used to perform specific single modifications of particular 
substrates, or as components of complex synthetic pathways to generate particular products 
or large combinatorial libraries. As described in the Examples, a number of modules of the 
blm gene cluster provide novel functionality. By way of example, a few preferred reactions 
are listed below. These examples are intended to be illustrative and are not exhaustive nor 
limiting. 

1. Use of BlmVIIIPKS to introduce branched methyl group. 

The blmVIII gene identified herein encodes a PKS module consisting of 
domains characteristic for known PKSs, such as ketoacyl synthase (KS), acyltransferase 
(AT), ketoreductase (KR), and ACP, with malonyl CoA acting as an extending unit. 
However, the identification of an integrated methyltransferase (MT) domain in the middle of 
BlmVIII is unique, representing the first PKS from actinomycetes that contains an internal 
MT domain. The use of this methyltransferase domain allows the introduction of a branched 
methyl group during a polyketide and/or polypeptide and/or hybriding 

polyketide/polypeptide synthesis. Figure 5 illustrates the use of BlmVIII PKS in engineering 
a polyketide biosynthesis that introduces a branched methyl group. 

The first formula in Figure 5 illustrates a polyketide synthesis mediated by 6- 
deoxyerythronolide B synthase (DEBS) which normally catalyzes the biosynthesis of the 
erythromycin aglycone, 6-deoxyerythronolide B. The remaining formulas show how the use 
of the blm VIII methyltransferase (MT) group at different points in the synthesis results in the 
introduction of a methyl group at different locations in the resulting product. 

In view of this illustration, one of skill in the art would appreciate that the 
blmVIII MT domain can be used in a wide variety of biosyntheses to introduce methyl 
branches. 
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2. Use of the blm gene cluster to make thiazolidine, thiazoline, 
thiazole, bi-thiazolidine, bithiazoline, and bithiazole-containing 
compounds. 

The BlmlV and BlmlUNKPSs are characterized by unusual Cy domains as 
well an unprecedented Ox domain, providing an efficient biosynthesis for a bithiazole 
structure. While thiazoline is the direct product of the Cy domain, the thiazoline-to-thiazole 
conversion generally is performed with an additional oxidation step. We identified at the C- 
terminus of NRPS-0 an additional domain that shows low, but significant, sequence 
homology to a family of putative oxidases/dehydrogenases, including the McbC protein of 
the microcin B17 synthase (Table 1). Microcin B17 synthase catalyzes the synthesis of the 
oxazole and thiazole-containing peptide antibiotic microcin B17, and McbC has been 
proposed to play a role in catalyzing the oxazoline/thiazoline-to-oxazole/thiazole conversion. 
Consequently, we propose that this extra domain at the C-terminus of NRPS-0 provides the 
oxidase/dehydrogenase activity for the biosynthesis of the bithiazole moiety of BLM, 
defining a novel Ox domain for NRPSs. 

It is noteworthy that a cell-free preparation from Sv ATCC 15003 has been 
reported to catalyze the conversion of phleomycins to BLMs in the presence of NAD + , 
supporting the hypothesis that the bithiazole moiety of BLM results from stepwise 
oxidations of a bithiazoline precursor (Fig. 1 A). (The phleomycin producer could be 
imagined to result from the loss of its Ox activity for the first thiazoline ring.) Given the 
wide distribution of thiazole or oxazole rings in natural products exhibiting an impressive 
array of biological activities, the cloning of the blmlV, ///genes and the identification of the 
Ox domain open many opportunities thiazole biosynthesis and to synthesize novel thiazole 
containing molecules by engineering peptide biosynthesis. 

Representative thiazole syntheses using variants of the blm NRPS are 
illustrated in Figure 6. Note that in Figure 6, A M and A N refer to an A domain that activates 
and amino acid with R M and R N groups, respectively. A c refers to an A domain that 
activates Cys (x = SH) or Ser (X = OH) that can be cyclized to form the oxiaoline/thiazoline 
or oxazole/thiazole structures. DH is a dehydratase. In view of these representative 
examples, one of skill in the art would appreciate that the blm NRPS domain and its variants 
can be used in a wide variety of chemical syntheses make thiazolidine, thiazoline, thiazole, 
bi-thiazolidine, bithiazoline, or bithiazole-containing compounds. 



41 



% % 

3, Use of the blm gene cluster to make heterocyclic ring-containing 
compounds. 

Various blm modules can be used to produce heterocyclic ring-containing 
compounds. Such heterocycles include, but are not limited to five member S- and N- 
containg compounds of the thiazolidine, thiazoline and thiazole family or the O- and N- 
containing compounds of the oxazolidine, oxazoline, and oxazole family. Again, the 
preparation of such compounds is illustrated in Figure 6. 

4. Use of the blm gene cluster to make sugars. 

In still another embodiment, the blm gene cluster or elements thereof can be 
used to make sugars. Such sugars include, but are not limited to L-sugars (with the BlmG 
epimerase), sugars modified by a carbamoyl group (e.g., using BlmD), and various 
disaccharides. Representative examples of such syntheses are illustrated in Figure 7. Such 
sugar biosynthesis genes can also e used to attach sugars onto other polyketide and/or 
peptide aglycones. 

F) Screening of products. 

Particularly where large combinatorial libraries are synthesized, e.g. using one 
or more modules and/or enzymatic domains of the blm gene cluster it will often be desired to 
screen the resulting compound(s) for the desired activity. Mehtods of screening compounds 
(e.g. polypeptides, polyketides, sugars, thiazoles, etc.) for various activities of interest (e.g. 
cytotoxicity, antimicrobial activity, particular chemical activities, etc.) are well known to 
those of skill in the art. 

Where large numbers of compounds are produced, it is often desired to 
rapidly screen such compounds using "high throughput systems" (HTS). High throughput 
assays systems are well known to those of skill in the art and many such systems are 
commercially available, (see, e.g., Zymark Corp., Hopkinton, MA; Air Technical Industries, 
Mentor, OH; Beckman Instruments, Inc. Fullerton, CA; Precision Systems, Inc., Natick, 
MA, etc.). These systems typically automate entire procedures including all sample and 
reagent pipetting, liquid dispensing, timed incubations, and final readings of the microplate 
in detector(s) appropriate for the assay. These configurable systems provide high 
throughputand rapid start up as well as a high degree of flexibility and customization. The 
manufacturers of such systems typically provide detailed protocols for the various high 
throughput screens. 
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VII. In Vitro syntheses. 

In additional embodiments of this invention, bleomycins and other 
polyketides and/or polypeptides are synthesized and/or modified in vitro. Individual 
enzymatic domains or modules can be used in vitro to modify a unit and/or to add a single 
monomeric unit to a growing polyketide or polypeptide chain. In one approach a 
metasynthetase providing all the desired synthetic activities recombinantly expressed and 
then provided, the appropriate substrates and buffer system e.g. in a bioreactor, to direct the 
synthesis of the desired product. In another approach, various PKSs and/or NRPSs are 
provided in different solutions and the growing polymer chains can be sequentially 
introduced into the plurality of solutions, each containing a single (or several) PKS or NRPS 
modules. In still another embodiment, the PKS and/or NRPS modules or enzymatic domains 
are provided attached to a solid support and a fluid contgaining the growing macromolecule 
is passed over the surface whereby the PKSs or NRPSs are able to react with the target 
substrate. 

In one preferred embodiment, a combinatorial library of polyketides or 
polypeptides, or combinations thereof, is created by using automated means to facilitate the 
sequential introduction of a multitude of polymeric chains, each attached to a solid support, 
to a collection of solutions, each containing a single PKS or NRPS module. These 
automated means can be used to systematically vary the sequence by which each polymeric 
chain is introduced into the various solutions, thereby creating a combinatorial library. 
Numerous methods are well known in the art to create combinatorial libraries of molecules 
by the sequential addition of monomeric units, for example as described in WO 97/02358. 

VIII. Kits. 

In still another embodiment, this invention provides kits for practice of the 
methods described herein. In one preferred embodiment, the kits comprise one or more 
containers containing nucleic acids encoding one or more of the blm gene cluster ORFs 
and/or one or more of the BLM PKS or NRPS modules or enzymatic domains. Certain kits 
may comprise vectors encoding the blm orfs and/or cells containing such vectors. The kits 
may optionally include any reagents and/or apparatus to facilitate practice of the assays 
described herein. Such reagents include, but are not limited to buffers, labels, labeled 
antibodies, bioreactors, cells, etc. 

In addition, the kits may include instructional materials containing directions 

(i.e., protocols) for the practice of the methods of this invention. Preferred instructional 

43 



% % 

materials provide protocols utilizing the kit contents for creating or modifying blm module or 
ORF and/or for synthesizing or modifying a molecule using one or more blm modules and/or 
enzymatic domains. While the instructional materials typically comprise written or printed 
materials they are not limited to such. Any medium capable of storing such instructions and 
communicating them to an end user is contemplated by this invention. Such media include, 
but are not limited to electronic storage media (e.g., magnetic discs, tapes, cartridges, chips), 
optical media (e.g., CD ROM), and the like. Such media may include addresses to internet 
sites that provide such instructional materials. 

EXAMPLES 

The following examples are offered to illustrate, but not to limit the claimed 

invention. 

Example 1 

Bleomycin biosynthesis in Strevtomyces verticillus ATCC15003, A model for hybrid 

peptide and polyketide biosynthesis. 

Here we report the cloning and characterization of the blm biosynthesis gene 
cluster from Sv ATCC 15003 (Fig. 2). Sequence analysis and biochemical characterization of 
individual modules enabled us to align the nine NRPS and one PKS modules in a linear order 
to constitute the Blm megasynthetase complex (Fig. IB). These studies revealed several 
unprecedented features for peptide and polyketide biosynthesis, setting the stage to 
investigate the molecular basis for intermodular communication between NRPS and PKS, 
and supported the wisdom of combining individual NRPS and PKS modules for 
combinatorial biosynthesis to make novel "unnatural" natural products from amino acids and 
short carboxylic acids. 

Materials and Methods. 

General procedures* 

Escherichia coli DH5a (Sambrook et al. (1989) Molecular Cloning: A 

Laboratory Manual 2nd ed, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 

USA), E, coli XL 1-Blue MR (Stratagene, La Jolla, CA), E. coli BL21(DE-3) (Novagen, 

Madison, WI), and Sv ATCC 15003 (American Type Culture Collection, Rockville, MD) 

were used in this work. pOJ446 (Agricultural Research Service Culture Collection, Peoria, 

IL), pQE60 (Qiagen, Santa Clarita, CA), pET28a and pET29a (Novagen), and other plasmids 
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were from commercial sources. E. coli (Sambrook, supra.) and Sv ATCC 15003 strains 
(Hop wood et ah (1985) Genetic Manipulation of Streptomyces: A Laboratory Manual, The 
John Innes Foundation, Norwich, UK) were cultured under standard conditions. 

Plasmid preparation was carried out by using commercial kits (Qiagen). Total 
Sv ATCC 15003 DNA was isolated according to literature protocols (Hopwood et ah (1985) 
Genetic Manipulation of Streptomyces: A Laboratory Manual, The John Innes Foundation, 
Norwich, UK; Nagaraja et ah (1987) Methods Enzymoh 153: 166-198). Restriction enzymes 
and other molecular biology reagents were from commercial sources, and digestions and 
ligation followed standard methods (Sambrook, supra.). For Southern analysis, digoxigenin 
labelling of DNA probes, hybridization, and detection were performed according to the 
protocols provided by the manufacturer (Boehringer Mannheim Biochemicals, Indianapolis, 
IN). 

Automated DNA sequencing was carried out on an ABI Prism 377 DNA 
Sequencer (Perkin-Elmer/ABI, Foster City, CA), and this service was provided by either the 
DBS Automated DNA Sequencing Facility, UC Davis, or Davis Sequencing (Davis, CA). 
Data were analyzed by the ABI Prism Sequencing 2.1.1 software and the Genetics Computer 
Group (GCG) program (Madison, WI). 

Cloning and sequencing of the blm gene cluster. 

A genomic library of Sv ATCC 15003 was constructed in pOJ446 according to 
literature procedures (Nagaraja et ah (1987) Methods Enzymoh 153: 166-198) and screened 
with probes made from both ends of the blmAB locus (Sugiyama et ah (1994) Gene 151: 11- 
16; Calcutt and Schmidt (1994) Gene 151: 17-21), leading to the localization of 140-kb 
contiguous DNA, of which 100-kb is upstream (Fig. 2) and 40-kb is downstream (data not 
shown) of the blmAB genes. Heterologous NRPS probes were amplified from Sv 
ATCC 15003 by polymerase chain reaction (PCR) according to literature procedures (Turgay 
and Marahiel (1994) Peptide Res. 7: 238-241) and used to screen the entire 140-kb DNA by 
Southern analysis under various hybridization conditions (Shen et ah (1999) Bioorg. Chem. 
27: 155-171). 

Prediction of substrate specificity of NRPSs. 

The nine Blm NRPS modules were compared with eighty four modules from 
various bacterial and fungal NRPSs available at the GenBank, including those with known or 
putative specificity for amino acids present in BLM. A table of overall similarities/identities 
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was generated by PILEUP analysis of the A3 to A6 regions, and the residues lining the 
substrate binding pocket by comparison with PheA (Conti et al. (1997) EMBOJ. 16, 4174- 
4183) were determined by PILEUP/PRETTY analysis. The percentage similarities for each 
Blm NRPS module were plotted against the rest of the NRPS modules to display the overall 
5 sequence homology between the A3 to A6 region. Those modules that showed significantly 
higher homology were selected to compare the amino acid residues that line the substrate 
binding pocket. 

Overproduction and biochemical characterization of the NRPS-1A and NRPS- 
6A proteins, 

10 Heterologous expression of the A domain in E. coli were performed according 

to literature procedures (Mootz and Marahiel (1997) J. BacterioL 179: 6843-6850). NRPS- 
1A (forward primer 5'-AAC CCA TGG CTG CTT CCC TGA CCC GCC TGG CC-3'(sEQ 
ID NO:76) and reverse primer 5'-CCT AGA TCT ACG GGC AGG TGG GGC GGT-3'( 
SEQ ID NO:7^) and NRPS-6A (forward primer 5'-GGG AAT TCC ATA TGA TCC TCA 

15 CGT CCT TCC AC-3\(SEQ ID Nf>:7$ and reverse primer 5'-GGC AAG CTT GGG TGA 
GGG TCC GTT CGG T-3\^EQ ID NO:7^) were amplified by PCR from Sv ATCC15O03 
cosmid clones. The resulting 1.6-kb fragment of NRPS- 1 A was first cloned into the 
Ncol/BgKl sites of pQE60 and then moved as an Ncol/Hindlll fragment into the similar sites 
of pET29a to yield pBSlO, and the resulting 1.6-kb fragment of NRPS-6A was directly 

20 cloned into the Ndel/Hindlll sites of pET28a to yield pBS 1 1 . Introduction of pBS 1 0 and 

pBSl 1 into E. coli BL21(DE-3) under standard expression conditions resulted in production 
of NRPS- 1 A (with an N-terminal S-tag and a C-terminal His 6 -tag) and NRPS-6A (with an N- 
terminal His6-tag), respectively. The soluble fractions of fusion proteins were subjected 
sequentially to an affinity chromatography on Ni-NTA resin and an anion exchange 

25 chromatography on a Hyper-D column (PerSeptive Biosystem, Framingham, MA), resulting 
in NRPS- 1 A and NRPS-6A with near homogeneity. 

Results and Discussion. 



Cloning of the blm gene cluster from 5V ATCC15003. 

Davies and co-workers previously cloned two BLM resistance genes (blmA 
30 and blmB) from Sv ATCC 15003 (Sugiyama et al (1994) Gene 15 1 : 1 1-16), and Calcutt and 
Schmidt (1994) Gene, 151:17-21, sequenced a 7.2-kb DNA fragment flanking the blmAB 
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genes, revealing seven open reading frames (orfs), none of which were found to encode Blm 

NRPS or PKS enzymes. Given the precedent that antibiotic production genes commonly 

occur as a cluster in actinomycetes, we adopted an approach combining chromosomal 

walking from the blmAB resistance locus and DNA hybridization with heterologous NRPS 

5 probes to clone and identify the blm cluster, leading to the localization of 140-kb contiguous 

Sv ATCC 15003 DNA. DNA sequencing of approximately 90-kb of the blm gene cluster, 

including the 7.2-kb blmAB locus, revealed 40 ORFs (Fig. 2). Preliminary functional 

assignments were made by comparison of the deduced gene products with proteins of known 

functions in the database. Among the ORFs identified from the blm cluster, we indeed found 

10 a PKS module, flanked by several NRPS modules-a fact that supports the hybrid 

NRPS/PKS/NRPS hypothesis for BLM biosynthesis-along with several sugar biosynthesis 

genes and genes encoding other biosynthesis enzymes as well as several resistance and 

regulatory genes (Table 1). 

Noteworthy ard the genes encoding the putative NRPS and PKS enzymes. 

The blml, blmll, and blmXI genes encode NRPSs with an unusual architecture. In contrast to 

all known NRPSs, which are ofmodular organization with each module consisting 

minimally of a condensation (C)\an adenylation (A), and a peptidyl carrier protein (PCP) 

domain (1), Blml, Blmll, and Blnbci are discrete proteins homologous to individual domains 

of type I NRPSs. We have characterized Blml as a type II PCP (18). The Blmll and BlmXI 

20 proteins could serve as candidates for type II condensation enzymes. It is unclear yet what 

role if any these discrete NRPS enzymes could play in BLM biosynthesis. 

The blmlll blmlV, blmV, blmVI, blmVIl blmlX, and blmX genes encode 

modular NRPSs consisting of domains characteristic for known type I NRPSs (A special 

thematic issue on polyketide and nonribosomal polypeptide biosynthesis, (1997) Chem. Rev. 

25 97: 2463-2706), such as the A, PCP, C, and condensation/cyclization (Cy) domains (Konz et 

al (1997) Chem. BioL 4: 927-937), as well as an unprecedented oxidation (Ox) domain (see 

discussion below). However, BlmVI is unique among all the Blm NRPSs identified. Its N- 

terminal module (NRPS-5) consists of an atypical A domain, which bears a close 

resemblance to a family of acyl CoA synthases (Fitzmaurice and Kolattukudy (1997) J. 

30 Bacteriol. 179: 2608-2615; Fitzmaurice and Kolattukudy (1998) J. Biol Chem. 273: 8033- 

8039), and an acyl carrier protein (ACP)-like domain (A special thematic issue on polyketide 

and nonribosomal polypeptide biosynthesis, (1997) Chem. Rev. 97: 2463-2706). Its C- 

terminal module is truncated and presumably interacts with BlmV to constitute the complete 

NRPS-3 module (Fig. IB). Also noteworthy are the C domain of NRPS-3 that lacks both 
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His residues of the conserved HHxxxDG (SEQ ID NO:4) active site for transpeptidation 
(Stachelhaus et al (1998) J. Biol Chem., 273: 22773-22781) and the extra C domain at the 
C-terminus of BlmV. These unusual features associated with BlmVI and BlmV may play 
roles in the formation of the P-aminoalaninamide and the pyrimidine moieties of BLM, 
5 which are unprecedented in peptide biosynthesis. For example, we propose that the NRPS- 
4-activated Ser is first dehydrated into dehydroalanine before condensation-an analogous 
Thr-to-2,3-dehydroaminobutyric acid dehydration has been observed in syringomycin 
biosynthesis (Guenzi et al (1998) J. Biol Chem. 273: 32857-32863). Conjugate addition to 
dehydroalanine by Asn on the NRPS-3 module downstream followed by an aminolysis to 
10 cleave the Ser- Asn adduct off the Blm megasynthetase furnishes the p-aminoalaninamide 
moiety (Fig. IB). The former reaction could be catalyzed by the C domain of NRPS-3 that 
apparently is nonfunctional for normal transpeptidation due to the lack of the active sites, 

□ and the latter reaction could be catalyzed by the acyl Co A synthase-like domain of NRPS-5 
{ % in a process that resembles the acyl CoA synthase-catalyzed synthesis of acyl CoA from 

15 carboxylic acid (Stachelhaus et al (1998) J. Biol Chem. 273: 22773-22781; Guenzi et al 

(1998) J. Biol Chem. 273: 32857-32863) but in the reverse direction in the presence of an 
i--y amino donor (Fig. IB). 

n The blmVIII gene encodes a PKS module consisting of domains characteristic 

M= for known PKSs, such as ketoacyl synthase (KS), acyltransferase (AT), ketoreductase (KR), 

20 and ACP, with malonyl CoA acting as an extending unit according to sequence comparison 

□ of the AT domain (Haydock et al (1995) FEBS Lett. 374: 246-248) (Fig. IB). However, the 
identification of an integrated methy transferase (MT) domain (Kagan and Clarke (1994) 
Arch. Biochem. Biophys. 310: All -All) in the middle of BlmVIII is unique, representing the 
first PKS from actinomycetes that contains an internal MT domain. The only other example 

25 of PKS from bacteria that contains an internal MT domain is HMWP1 of the yersiniabactin 
gene cluster (Pelludat et al (1998) Bacteriol 180: 538-546). It has been assumed that 
fungal PKSs in general contain internal MTs for the introduction of methyl branch into the 
polyketide products, as it has been shown recently in lovastatin biosynthesis (Kennedy et al 

(1999) Science 284: 1368-1372). 

30 The Blm megasynthetase-templated assembly of BLM. 

According to the hybrid NRPS/PKS/NRPS model for BLM biosynthesis (Fig. 

1 A), we predict a linear modular organization of individual NRPS and PKS modules to 

constitute the Blm megasynthetase. Thus, the first functional domain of the Blm 
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megasynthetase should be a NRPS module that initiates BLM biosynthesis by activating L- 
Ser as an amino acylthioester to set the stage for transpeptidation. Chain elongation 
proceeds by sequential incorporation of L-Asn, L-Asn, L-His, and L-Ala 3 requiring four 
additional NRPS modules. In the next step, a malonate reacts with the resulting pentapeptide 
5 intermediate to form a P-ketothioester intermediate that is subsequently methylated at the ex- 
position and reduced at the P-keto group. A PKS module presumably dictates all these 
biosynthetic events and interacts with the aligned NRPS module upstream to channel the 
growing peptide intermediate from an NRPS module to a PKS module. After one cycle of 
polyketide elongation, peptide elongation is resumed by incorporation of an L-Thr residue. 

10 This step is presumably catalyzed by an NRPS module that interacts with the upstream PKS 
module to channel the growing polyketide intermediate (as far as the active site is concerned) 
from a PKS module to an NRPS module. At this stage, methylation occurs at the pyrimidine 
moiety of the growing intermediate, presumably catalyzed by a discrete methyltransferase; 
chain elongation is continued by three additional NRPS modules that incorporate a P-Ala 

15 and two L-Cys molecules sequentially. Finally, the fully assembled BLM 

peptide/polyketide/peptide backbone is hydroxylated at the P-position of the His residue, 
presumably by a discrete hydroxylase, and released from the Blm megasynthetase complex 
via nucleophilic substitution of the RCO-S-PCP species by a terminal amine to form the 
BLM aglycone. Intermediates after five of the nine proposed elongation steps were in fact 

20 isolated as P-3, P-3A, P-3K, P-4, P-5, P-5m, P-6m, and P-6mo (Takita and Muroka (1990) 

pages 289-309 in Biochemistry of Peptide Antibiotics: Recent Advances in the Biotechnology 
of p-Lactams and Microbial Peptides, Kleinkauf, H. & von Dohren, H. eds., W. de Gruyter, 
N.Y.), which presumably resulted from premature departure from the Blm megasynthetase 
complex before the chain reaches its full length (Fig. IB). 

25 Most of the bacterial NRPS gene clusters characterized to date are organized 

in operon-type structures, encoding multimodular NRPS proteins with individual modules 
organized along the chromosome in a linear order that parallels the order of the amino acids 
in the resultant peptides, i.e., following the "colinearity rule" for the NRPS-templated 
assembly of peptides from amino acids (A special thematic issue on polyketide and 

30 nonribosomal polypeptide biosynthesis, (1997) Chem. Rev. 97: 2463-2706; Cane et al 

(1998) Science 282: 63-68). Inspection of the blm gene cluster (Fig. 2) showed that the Blm 
NRPS and PKS modules apparently are not organized according to the "colinearity rule" for 
BLM biosynthesis (Fig. 1). [Exception to the "colinearity rule" was also noted in the 
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syringomycin synthetase gene cluster (Guenzi et al (1998) J. Biol. Chem. 273: 32857- 
32863), and in fact, Grandi and co-workers have demonstrated recently in Bacillus subtilis 
that neither the operon-type structure nor the physical linkage of individual modules is 
essential for proper assembly and activity of the surfactin NRPS megasynthetase (Guenzi et 
5 al. (1998) J. Biol. Chem. 273: 14403-14410).] Realizing that the BLM biosynthesis cannot 
be rationalized according to the "colinearity rule", we determined the substrate specificity of 
individual NRPS and PKS modules in an attempt to shed light on the modular organization 
of the Blm megasynthetase complex. Brick and co-workers postulated, based on the X-ray 
structural analysis of the A domain of GrsA, PheA, that the region between core sequences 

10 A3 to A6 represent the amino acid specificity determinant of an NRPS module (Conti et al 
(1997) EMBOJ. 16: 4174-4183). Since the A domains in all known NRPSs share a 
significant sequence identity (ensuring that the main chain conformation of the enzymes is 
likely to be very similar), they further proposed that the differing substrate specificity of 
individual NRPS modules will be mainly determined by the nature of the amino acids lining 

15 the substrate binding pocket (Stachelhaus et al (1999) Chem. Biol. 6: 493-505; Conti et al 
(1997) EMBOJ. 16: 4174-4183). Given this structural information and the vast amount of 
NRPS sequences available at the GenBank, we developed a novel approach for predicting 
substrate specificity for NRPS modules by comparing the overall sequence between the A3 
to A6 region and the eight amino acid residues that line up the substrate binding pocket. 

20 While a constant level of similarities (30%-40%) was evident among all the NRPS modules 
analyzed, most of the Blm NRPS modules showed striking similarities (50%-60%) to a 
particular cluster of NRPS modules as exemplified in Fig. 3 A for NRPS-1 and NRPS-6. 
Close examination of these modules clustered with higher similarities revealed that they 
activate the same or very similar amino acid, based on which the putative substrate for the 

25 NRPS in query could be predicted, i.e., NRPS-1 and NRPS-6A activate L-Cys and L-Thr, 
respectively. These predictions were further supported by comparing the residues lining the 
substrate binding pocket. For example, the amino acid residues lining the substrate binding 
pocket for NRPS-1 and NRPS-6 are almost identical to those NRPS modules that are known 
to activate L-Cys and L-Thr, respectively, as shown in Fig. 3B. To verify the predicted 

30 amino acid specificity, we overproduced and purified the NRPS-1 A and NRPS-6 A proteins 

(Fig. 3C) and examined their substrate specificity according to the amino acid-dependent 

ATP-PPi assay (Lee et al (1970 Meth. Enzymol, 43: 585-602; Ku et al (1997) Chem. & 

Biol. t 4: 203-207). NRPS-1 A and NRPS-6A indeed activate specifically L-Cys and L-Thr, 

respectively, among the amino acids tested (Fig. 3D). The latter results greatly enhanced our 
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confidence in predicting the substrate specificity of a NRPS module by the above method. 
We subsequently determined the substrate specificity for all the NRPS modules identified 
from the blm gene cluster and they in fact accounted for all nine amino acids required for 
BLM biosynthesis (Fig. 2). 
5 Using the substrate specificity of individual NRPS and PKS modules as a 

guide, we can align the nine NRPS and one PKS modules to constitute the Blm 
megasynthetase as shown in Fig. IB according to our hybrid NRPS/PKS/NRPS model for 
BLM biosynthesis (Fig. 1 A). Among all the PKSs or NRPS systems examined so far, the 
Blm megasynthetase consists of the largest number of individual proteins. The precise 

10 interactions among all the Blm NRPS and Blm PKS proteins to constitute the Blm 
megasynthetase complex, therefore, reflect a remarkable power of protein-protein 
recognition (Guenzi et ah (1998) J. Biol. Chem. 273: 14403-14410; Gokhale et ah (1999) 
Science 284: 482-485). Although we are yet to provide direct evidence supporting the 
specific protein-protein interactions between the neighboring proteins, it is striking to note 

15 that all the biosynthetic intermediates isolated are derailed from either PKS or NRPS 

modules at the junctions between the interacting proteins (Fig. IB). Since it is not difficult 
to imagine that an intermediate is more likely to fall off the enzyme complex when it is 
subjected to interpeptide transfer than to intrapeptide transfer, we view the latter observation 
as strong evidence supporting the current model of the Blm megasynthetase 

20 BlmlX/BlmVIII/BlmVH as a hybrid NRPS/PKS/NRPS model, \ 

Recent biosynthetic studies on rapamycin in Streptomyces hygroscopicus 
(Konig et ah (1997) Eur. J. Biochem. 247: 526-534), yersiniabactin in Yersinia 
enterocolitica and Y. pestis (Pelludat et ah (1998) J. BacterioL 180: 538-546; Gehring et ah 
(1998) Chem. Biol. 5: 573-586; Gehring et ah (1998) Biochemistry 37: 1 1637-1 1650) and 

25 TA in Myxococcus xanthus (Paitan et ah (1999) J. Moh Biol. 286, 465-474) are starting to 
shed light on hybrid peptide and polyketide biosynthesis. Two models are emerging for the 
alignment between a NRPS and a PKS module. The interacting NRPS and PKS modules 
could be either covalently linked by arranging all domains in a linear order on the same 
protein (Pelludat et ah (1998) J. Bacterioh 180: 538-546; Gehring et ah (1998) Chem. Biol. 

30 5: 573-586; Gehring et ah (1998) Biochemistry 37: 1 1637-1 1650; Paitan et ah (1999) J. Moh 
Bioh 286: 465-474) or physically located on two separate proteins, requiring specific protein- 
protein recognition to ensure the correct pairing between the interacting modules (Pelludat et 
ah (1998) J. Bacterioh 180: 538-546; Konig et ah (1997) Eur. J. Biochem. 247: 526-534; 
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Gehring et al (1998) Chem. Biol. 5: 573-586; Gehring et al (1998) Biochemistry 37: 1 1637- 
1 1650). Common to all these systems, however, are the unusual features associated with the 
interacting modules, such as the lack of the AT domain of the PKS module in Tal (Paitan et 
al (1999) J. Mol Biol 286: 465-474) and the lack of the A domain and the presence of the 
5 Cy domain of the NRPS modules in both HMWP1 and HMWP2 (Pelludat et al (1998) J. 
Bacteriol 180: 538-5461; Gehring et al (1998) Chem. Biol 5: 573-586; Gehring et al 
(1998) Biochemistry 37: 1 1637-1 1650). While extremely intriguing, the latter features 
complicate mechanistic analysis of these systems, making them less ideal candidates for 
studying how NRPS and PKS integrate into a productive hybrid NRPS/PKS complex. 

1 0 The BlmlXIBlm VIII/Blm VII system combines the features of both hybrid 

NRPS/PKS and PKS/NRPS systems, serving as an ideal model for studying hybrid peptide 
and polyketide biosynthesis. The fact that both the BlmlX and BlmVII NRPS modules and 
the BlmVIII PKS module themselves are three separate proteins with a typical domain 
organization for NRPS and PKS enzymes greatly simplifies the mechanistic analysis of the 

15 hybrid NRPS/PKS/NRPS complex. We have found that the KS domain of BlmVIII is more 
similar to the KSs of HMWP1 (Pelludat et al (1998) J. Bacteriol 180: 538-546) and Tal 
(Paitan et al (1999) J. Mol Biol 286: 465-474), both of which catalyze the elongation of a 
peptidyl intermediate with a malonate, than to KSs of type I PKSs. We attribute these subtle 
differences to their unique reactivity that catalyzes the transfer of the peptidyl intermediate 

20 from the PCP to the KS domain, which presumably takes place prior to chain elongation 
(Fig.4). Subsequent condensation catalyzed by the KS domain between the peptidyl 
intermediate and malonyl-S-ACP results in the elongation of the growing peptide with a 
carboxylic acid. Equally striking are the discoveries that the ACP domain of BlmVIII is 
more similar to a PCP than to an ACP and that the C domain of BlmVII has an additional N- 

25 terminal segment of about 50 amino acids that is rich in arginine, aspartic acid, and glutamic 
acid. The latter feature is analogous to the N-terminal interpolypeptide linker for type I PKS, 
which has recently been demonstrated to play a critical role in intermodular communication 
(Gokhale et al (1999) Science 284: 482-485). We propose that these unique features of the 
ACP domain from the BlmVIII PKS module and the C domain from the BlmVII NRPS 

30 module provide the molecular basis for the C domain to recognize the acyl-S-ACP as a 
substrate. Subsequent condensation catalyzed by the C domain between acyl-S-ACP and 
amino acyl-S-PCP results in the elongation of the growing polyketide (as far as this 
condensation is concerned) with an amino acid (Fig. 4). 
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Novel domains for the Blm NRPS and PKS modules. 

Various NRPS and PKS domains have been characterized, which are the 
building blocks for the entire field of combinatorial biosynthesis. The success for 
combinatorial biosynthesis depends critically upon the repertoire of these individual 
5 domains. Genetic analysis of the blm gene cluster has uncovered several novel NRPS and 
PKS domains. Without being bound to a particular theory, it is believed that BlmVI and 
BlmVave involved in the biosynthesis of the p-aminoalaninamide and pyrimidine moieties of 
BLM). In addition, the MT domain in BlmVIII, the Cy domains in BlmlV, and the Ox 
domain in Blmlll are novel domains. 
10 The BlmVIII PKS module apparently furnishes the "propionate" unit into 

BLM in two steps by evolving a malonyl CoA-specifying AT domain coupled with a novel 
S-adenosylmethionine-requiring MT domain, representing a new mechanism to introduce 
Q methyl branches into polyketides (Fig. 4). This biosynthetic reaction sequence is 

."g unprecedented for polyketide biosynthesis since all PKSs from actinomycetes examined to 

1 5 date incorporate the alkyl branches into the resultant polyketides by selecting various alkyl 
v3 malonates as the extending units that are determined by the AT domains. Yet, feeding 

ry experiments have unambiguously established that the polyketide moiety of BLM was 

;! derived from an acetate and a methionine (Takita and Muroka (1990) pages 289-309 in 

■-J32? 

\* A Biochemistry of Peptide Antibiotics: Recent Advances in the Biotechnology of p-Lactams and 

j'fjj 20 Microbial Peptides, Kleinkauf, H. & von Dohren, H. eds., W. de Gruyter, N.Y.), a fact that 
|;^J fits well with the observed unusual domain organization of the BlmVIII PKS module (Fig. 

4). It is conceivable that the combination of this MT domain with an AT domain specific for 
a methyl malonate extending unit (Haydock et ah (1995) FEBS Lett. 31 A: 246-248) could 
result in the synthesis of polyketides with a gem-dimethyl moiety via engineering polyketide 
25 biosynthesis. Such a gem-dimethyl group has been found to be a very important 

pharmacophore for the epothilones, a family of hybrid peptide and polyketide metabolites 
that exhibits a remarkable antitumor activity similar to taxol (Ojima et alo. (1999) Proc. 
Natl Acad. ScL USA 96: 4256-4261). 

The BlmlV and Blmlll NRPSs are characterized by the unusual Cy domains 
30 as well as the unprecedented Ox domain, providing an efficient biosynthesis for a bithiazole 
structure. The Cy domain was first defined by Marahiel and co-workers in their study of 
bacitracin biosynthesis in B. licheniformis (Konz et ah (1997) Chem. Biol. 4: 927-937), and 
the Cy activity was demonstrated recently by Walsh and co-workers in their study of the 
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HMWP1 and HMWP2 proteins for yersiniabactin biosynthesis in Y. pestis (Gehring et al 
(1998) Chem. Biol 5: 573-586; Gehring et al (1998) Biochemistry 37: 1 1637-1 1650). 
While thiazoline is the direct product of the Cy domain, the thiazoline-to-thiazole conversion 
requires an additional oxidation step. We identified at the C-terminus of NRPS-0 an 
5 additional domain that shows low, but significant, sequence homology to a family of putative 
oxidases/dehydrogenases, including the McbC protein of the microcin B17 synthase (Table 
1). Microcin B17 synthase catalyzes the synthesis of the oxazole and thiazole-containing 
peptide antibiotic microcin B17, and McbC has been proposed to play a role in catalyzing the 
oxazoline/thiazoline-to-oxazole/thiazole conversion (Li et al (1996) Science 274: 1 188- 
10 1 193; Milne, et al (1999) Biochemistry 38: 4768-4781). Consequently, we propose that this 
extra domain at the C-terminus of NRPS-0 could provide the oxidase/dehydrogenase activity 
needed for the biosynthesis of the bithiazole moiety of BLM, defining a novel Ox domain for 
q NRPSs. It is noteworthy that a cell-free preparation from Sv ATCC 15003 has been reported 

x % to catalyze the conversion of phleomycins to BLMs in the presence of NAD + (Takita and 

'-J 15 Muroka (1990) pages 289-309 in Biochemistry of Peptide Antibiotics: Recent Advances in 
1 0 the Biotechnology of p-Lactams and Microbial Peptides, Kleinkauf, H. & von Dohren, H. 

eds., W. de Gruyter, N.Y.), supporting the hypothesis that the bithiazole moiety of BLM 
results from stepwise oxidations of a bithiazoline precursor (Fig. 1 A). (The phleomycin 
producer could be imagined to result from the loss of its Ox activity for the first thiazoline 
; s 2 20 ring.) Given the wide distribution of thiazole or oxazole rings in natural products (Ojima et 
□ alo. (1999) Proc. Natl Acad. Set USA 96: 4256-4261; Li et al (1996) Science 274: 1 188- 

1 193) exhibiting an impressive array of biological activities, the cloning of the blmlVJII 
genes and the identification of the Ox domain open many opportunities to define the 
mechanism for thiazole biosynthesis and to potentially synthesize novel thiazole containing 
25 molecules by engineering peptide biosynthesis. 
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Example 2 

Identification and characterization of a type II peptidyl carrier protein from the 
bleomycin producer Streptomvces verticillus ATCC 15003. 

Results. 

5 Cloning and sequence analysis of the A/w/gene 

In our effort to clone the gene cluster responsible for BLM biosynthesis, we 
have determined 80 kb DNA sequence from Sv ATCC 15003 (Fig. 8). Among the orfs 
identified within the blm gene cluster is the small orf of 273 base pairs (bp), blml, which is 
located approximately 4 kb upstream of the previously characterized blmAB resistance locus 

10 (Sugiyama e^/. (1994) Gene 151: 1 1-16; Calcutt and Schmidt (1994) Gene 151: 17-21) 
(Fig. 8B). The blml gene encodes a protein of 90 amino acids with a molecular weight of 
9957 and a pi of 6.52 (Fig. 8C). Computer-assisted analysis (Altschul et al. (1997) Nucleic 
Acids Res. 25: 3389-3402) of the deduced amino acid sequence indicates that Blml is very 
similar to various PCP domains of NRPSs (ranging around 40% identity and 60% similarity, 

15 as shown in Figure 9). Like known PCP domains of NRPS, Blml has the highly conserved 
signature motif of LGGXS, within which the serine residue is the site for 4'- 
phosphopantetheinylation (Stachelhaus and Marahiel (1995) FEMS Microbiol. Lett. 125: 3- 
14; Marahiel et al (1997) Chem. Rev. 97: 2651-2673). The latter posttranslational 
modification is generally necessary for peptide biosynthesis; converting the apo-PCP into the 

20 functional holo-PCP (Marahiel et al (1997) Chem. Rev. 97: 2651-2673; Walsh et al (1997) 
Curr. Opin. Chem. Biol 1: 309-315). Based on sequence comparison, Blml is most related 
to PCPs and not to other kinds of carrier proteins that also share the same LGGXS (SEQ ID 
NO:80) motif and undergo the same posttranslational 4'-phosphopantetheinylation [31], such 
as the E. coli acyl carrier protein (ACP) (Lambalot and Walsh (1995) J. Biol Chem. 270: 

25 24658-24661), the ACP domain of type I PKS and the type II PKS ACP (Cox and Simpson 
(1997) FEBSLett. 405: 267-272; Carreras et al (1997) Biochemistry 36: 1 1757-1 1761), the 
ArCP domain (Gehring et al (1998) Biochemistry 37: 2648-2659), and several nodulation 
related ACP-like proteins (Epple et al. (1998) J. Bacteriol 180: 4950-4954; Spaink et al 
(1991) Nature 354: 125-130). 
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Overexpression of blml in E. coli 

To overexpress the blml gene in E. coli, we directly amplified the blml gene 
by PCR from the Sv. ATCC 15003 genomic DNA and cloned it into the pQE-60 expression 
vector to give pBSl so that Blml could be produced as a protein with a native N-terminus 
5 and a Hise-tag at its C-terminus. However, no production of the Blml protein was detected, 
as judged by sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE), upon 
introduction of pBSl into E. coli M15(pREP4) under the standard overexpression conditions 
recommended by the manufacturer (Qiagen). We reasoned that the small Blml protein with 
its native N-terminus may not be stable in the heterologous host, and hence moved the blml 
10 gene from pBSl into pET-29a to yield the second overexpression construct of pBS2. In the 
latter construct, Blml should be produced as a fusion protein with 27 extra amino acid 
residues at its N-terminus, including an S-tag and the thrombin cleaving site, in addition to 
□ the Hise-tag at its C-terminus. Introduction of pBS2 into E. coli BL21(DE-3) under the 

standard overexpression conditions recommended by the manufacturer (Novagen) indeed 
: ^ 15 resulted in overproduction of Blml. In fact, the bulk of the soluble protein was the 
,3 overproduced Blml, which was easily purified by affinity chromatography using Ni-NTA 

j'lj resin (Qiagen). It is noteworthy that fusion of the additional 23 amino acids to the N- 

j!^ terminus of Blml as in pBS2 and change of the expression system from E. coli M15(pREP4) 

h- (pBSl) to E. coli BL21(DE-3)(pBS2) dramatically improved the expression level of blml, 

?*% 

ri 20 In vivo 4'-phosphopantetheinylation of the Blml protein 

To establish Blml as a type II PCP, we tested if it could serve as a substrate 
for a PCP-speciflc 4'- PPTase. PPTases catalyze the posttranslational modification of an 
apo-PCP into a holo-PCP by transferring the 4'-phosphopantetheine moiety from co-enzyme 
A (CoA) to the conserved serine residue of PCP, and this reaction has been developed 

25 recently into a general method to prepare various holo-PCP, holo-ACP, or holo-ArCP from 
the corresponding apoproteins (Stachelhaus et ah (1996) Chem. Bioh 3: 913-9211; Gehring et 
ah (1998) Biochemistry 37: 2648-2659; Gehring et ah (1998) Biochemistry 37: 11637- 
1 1650; Weinreb et ah (1998) Biochemistry 37: 1575-1584 ). Therefore, we decided to 
investigate the 4'-phosphopantetheinylation of Blml under both in vivo (Ku et ah (1997) 

30 Chem. Bioh 4: 203-207) and in vitro (Gehring et ah (1998) Biochemistry 37: 1 1637-1 1650; 
Lambalot et ah (1996) Chem. Bioh 3: 923-936; Quadri et ah (1998) Biochemistry 37: 1585- 
1595) conditions. 
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To examine 4'-phosphopantetheinylation of Blml in vivo, we chose E. coli 
OG7001 as the expression host, which is a (3-alanine auxotroph derived from E. coli 
BL21(DE3) by PI co-transduction of the panD mutation from E. coli SJ16 (Epple et ah 
(1998) J. Bacterioh 180: 4950-4954). Upon introduction of pBS2 into E. coli OG7001, blml 
5 was exceptionally well expressed and the overproduced Blml protein was readily purified. 
However, high performance liquid chromatography (HPLC) analysis showed that the 
purified Blml was essentially in the apo-form (Fig. 10A), indicative that apo-Blml was a 
poor substrate for the E. coli endogenous PPTases, such as EntD and ACP synthase 
(Lambalot et ah (1996) Chem. Bioh 3: 923-936; Walsh et ah (1997) Curr. Opin. Chem. Biol. 
10 1 : 309-3 15; Lambalot and Walsh (1995) J. Bioh Chem. 270: 24658-24661). To circumvent 
the poor endogenous PPTase activity, we next co-expressed blml with the gsp gene, which 
was isolated from the gramicidin S producer Bacillus brevis, and encoded a PPTase that was 
i<3 known to 4 5 -phosphopantetheinylate heterologously produced PCPs in E. coli (Lambalot et 

| ah (1996) Chem, Bioh 3: 923-936; Ku et ah (1997) Chem. Bioh 4: 203-207). We co- 

-I 1 5 transformed pDPT-Gsp, in which the expression of the gsp gene was under the control of the 
j T5/Lac promoter (Ku et ah (1997) Chem. Bioh 4: 203-207), and pBS2 into E. coli OG7001. 

:;Jl Blml was again very well expressed and the resulting Blml protein was similarly purified. 

3 HPLC analysis showed that at least 60% of overproduced Blml was modified into the holo- 

lI Blml protein (Fig. 10B). (A PCP domain was similarly 4'-phosphopantetheinylated in vivo 

20 before by co-expressing gsp in E. coli using pDPT-Gsp, and approximately 80% of the PCP 
□ was produced in the holo-form (Ku et ah (1997) Chem. Bioh 4: 203-207). 

w We next cultured E. coli OG7001(pBS2) and E. coli OG7001(pBS2/pDPT- 

Gsp) in the presence of [3- 3 H]-P-alanine, a known biosynthetic precursor of 4'- 
phosphopantetheine (Stachelhaus et ah (1996) Chem. Bioh 3: 913-921; Epple et ah (1998) J. 
25 Bacterioh 180: 4950-4954). Specific incorporation of [3- 3 H]-(3-alanine into the 4'- 

phosphopantetheine moiety of holo-Blml was determined by autoradiographic analysis. 
Thus, while fermentation of E. coli OG7001(pBS2) in the presence of [3- 3 H]-p-alanine led 
to an IPTG-dependent overproduction of Blml, little of the resulting Blml protein was 3 H- 
labeled, indicative of being produced in the apo-form. In contrast, fermentation of E. coli 
30 OG7001(pBS2/pDPT-Gsp) in the presence of [3- 3 H]-p-alanine resulted in a significant 
increase of IPTG-dependent incorporation of the 3 H-label into the overproduced Blml 
protein, suggesting a specific incorporation of [3- 3 H]-P-alanine into holo-Blml, presumably 
in the 4'-phosphopanthetheine moiety. There were several additional proteins that were also 
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weakly labeled by [3- H]-p-alanine. However, both their expression and their incorporation 
by 3 H-label were independent from either IPTG induction or the presence of Gsp, hence 
these proteins were unrelated to Blml. (Similar background labeling was reported before for 
in vivo 4 , -phosphopanthetheinylation of other PCP (Epple et al. (1998) J. Bacteriol. 180: 
5 4950-4954)). We also purified the Blml protein from E. coli OG7001(pBS2/pDPT-Gsp) and 
demonstrated that it was the holo-Blml protein that was specifically associated with the 3 H- 
activity. Finally, we confirmed the identity of holo-Blml by subjecting the purified Blml 
protein to MALDI-Tof mass spectral analysis (Weinreb et al. (1998) Biochemistry 37: 1575- 
1584). Blml produced in the absence of the Gsp PPTase yielded a single peak with a 

10 molecular weight of 13,952, suggesting that the produced Blml protein is in the apo-form 
(calc, 13,949). In contrast, Blml produced in the presence of Gsp yielded two species with 
molecular weight of 13,969 and 14,303, respectively. While the species with the molecular 
weight of 13,969 represents apo-Blml, a molecular weight of 14,303 unambiguously 
confirmed the other protein as holo-Blml (calc, 14,289). The latter result indicated that the 

15 purified Blml consisted of both the apo- and holo-Blml proteins, in agreement with the 
HPLC analysis results (Fig. 10B). 

In vitro 4'-phosphopantetheinylation of the Blml protein 

To investigate 4'-phosphopantetheinylation of Blml in vitro, we chose the Sfp 
protein as the preferred PPTase, which had been isolated before from the surfactin producer 
20 Bacillus subtilis (Nakano et al (1992) MoL Gen. Genet. 232: 3 13-321). (Overexpression of 
gsp in E. coli using pDPT-Gsp resulted in predominantly an insoluble Gsp protein (Ku et al 

(1997) Chem. Biol. 4: 203-207). The Sfp PPTase was overproduced in E. coli 

MV1 190(pUC8-Sfp) and purified to near homogeneity as described before (Quadri et al. 

(1998) Biochem., 37: 1585-1595; Nakano et al (1992) MoL Gen. Genet, 232: 313-321). 
25 Upon incubation of the purified apo-Blml with [ 3 H-pantetheine]-CoA in the presence of the 

Sfp PPTase, we examined the covalent incorporation of the [ 3 H-pantetheine]-4 5 - 
phosphopantetheine moiety from CoA into holo-Blml by autoradiographic analysis. Indeed, 
the apo-Blml was quantitatively labeled by [ 3 H-pantetheine]-CoA, and no labeling was 
observed in the absence of either the apo-Blml or the Sfp PPTase protein, demonstrating that 
30 the Sfp PPTase can recognize apo-Blml as a substrate and specifically transfer the 4'- 
phosphopantetheine group from CoA into holo-Blml. 
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In vitro aminoacvlation of Blml 

Once we established Blml as a type II PCP that can be readily modified by 
PCP-specific PPTases into the holo-Blml protein, we tested if the holo-Blml could be 
aminoacylated in trans, requiring an A domain. Since Blml has no cognate A domain of its 
5 own, we turned our attention to another putative biosynthesis gene cluster we have cloned 
previously from Sv ATCC15003, which encodes at least four NRPS and one PKS modules. 
We have established that this gene cluster is not clustered with the blm locus and is unrelated 
to BLM biosynthesis. From this gene cluster, we amplified by PCR a 1579 bp fragment 
encoding an A domain, named Val-A, which we predicted to have a molecular weight of 

10 56,581 and a pi of 7.39. We cloned vaUA into pET-28a to yield pBS3, in which Val-A 
would be produced as a fusion protein with a His6-tag at the N-terminus. Introduction of 
pBS3 into E. coli BL21(DE3) under the standard overexpression conditions recommended 
by the manufacturer (Novagen) resulted in good overproduction of Val-A, predominantly in 
soluble form, from which Val-A was purified by affinity chromatography using Ni-NTA 

15 resin. The purified Val-A protein was active by the amino acid-dependent ATP-PPi 

exchange assay (Lee and Lipmann (1970) Method Emzymol. 43: 585-602; Ku et ah (1997) 
Chem. Bioh, 4: 203-207). Among the 23 amino acids tested, Val-A specifically activated 
valine, an amino acid that is not required for BLM biosynthesis. 

To carry out the aminoacylation in trans, we incubated the purified holo-Blml 

20 and Val-A in vitro in the presence Z-[ 14 C(U)]valine and ATP (Stachelhaus et ah (1996) 
Chem. Bioh 3: 913-921; Weinreb et ah (1998) Biochemistry 37: 1575-1584). The 
aminoacylated holo-BlmI-L-[ 14 C(U)]valine species was subjected to SDS-PAGE and specific 
attachment of L-[ 14 C(U)] valine to holo-Blml was determined by autoradiographic. analysis. 
Remarkably, the holo-Blml was specifically labeled by Z-[ 14 C(U)]valine in the presence of 

25 Val-A, indicative of the formation of the holo-Blml-S-valine thioester. The in trans 

aminoacylation between the holo-Blml and Val-A proteins appeared to be very specific. 
Neither incubation of Z-[ 14 C(U)]valine with Val-A, the apo-Blml, or the holo-Blml protein 
alone, nor incubation of Z-[ 14 C(U)]valine with the Val-A and apo-Blml proteins, resulted in 
the detection of 14 C-labeled Blml protein. 

30 Discussion. 

Nonribosomal peptides and polyketides are two distinct classes of natural 

products yet are assembled from amino acids and short carboxylic acids by NRPSs and 

PKSs, respectively, in strikingly similar strategies (Cane et ah (1998) Science 282: 63-68). 
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These fascinating multifunctional enzyme complexes have been classified into two types 

based on their gene organization and enzyme architecture. Type I enzymes are 

multifunctional proteins consisting of domains for individual enzyme activities, and type II 

enzymes are multienzyme complexes consisting of discrete proteins that are largely 

5 mono functional While both type I and type II PKSs (Fig. 1 1 A and 1 1C) have been well 

characterized to account for the vast structural diversities found in polyketide biosynthesis 

(Hopwood (1997) Chem Rev. 97: 2465-2497), all NRPSs studied so far are exclusively the 

type I modular enzymes (Fig. 1 IB) (Kleinkauf and von Dohren: H. (1996) Eur. J. Biochem. 

236: 335-351; Marahiel et al (1997) Chem. Rev. 97: 2651-2673; von Dohren et al (1997) 

10 Chem. Rev. 97: 2675-2705). It is very tempting to speculate the existence of a type II NRPS 

that, analogous to type II PKS (Shen and Hutchinson (1993) Science 262: 1535-1540; Bao et 

al. (1998) Biochemistry 37: 8132-8138; Carreras and Khosla (1998) Biochemistry 37: 2084- 

,«j 2088), should consist of discrete proteins possessing enzyme activities such as the A 

5 (Stachlhaus and Marahiel (1995) J. Biol Chem. 270: 6163-6169), the PCP (Stein and Morris 

Sj 15 (1996) J. Biol Chem. 271: 15428-15435), or the C (Stachlhaus et al. (1998) J. Biol Chem. 

j 273: 22773-22781) domains of type I NRPSs (Fig. 1 ID). The fact that both the A 

U (Stachlhaus and Marahiel (1995) J. Biol Chem. 270: 6163-6169; Konz et al (1997) Chem. 

Biol 4: 927-937; Weinreb et al (1998) Biochemistry 37: 1575-1584; Mootz and Marahiel 

U (1997) J. Bacteriol 179: 6843-6850) and the PCP (Stachelhaus et al (1996) Chem. Biol 3: 

! : 2 20 913-921; Weinreb et al (1998) Biochemistry 37: 1575-15841; Pfeifer et al (1995) 

□ Biochemistry 34: 7450-7459; Haese et al (1994) J. Mol Biol 243: 1 16-122; Lambalot et al 

!,J (1996) Chem. Biol 3: 923-936; Quadri et al (1998) Biochemistry 37: 1585-1595; Gehring et 

al (1996) Chem. Biol 4: 17-24; Ku et al (1997) Chem. Biol. 4: 203-207) domains of type I 

NRPSs can act as independent enzymes supports the hypothesis of a type II NRPS. 

25 We have now cloned and sequenced the blml gene, overproduced and 

characterized the Blml protein as a bona fide type II PCP, and demonstrated that holo-Blml 

can be aminoacylated by a completely unrelated A domain, providing for the first time 

genetic and biochemical evidence for a type II NRPS enzyme. We concluded Blml as a type 

II PCP based on the following criteria. (1) The deduced amino acid sequence of the blml 

30 gene is highly homologous to various PCP domains of known NRPSs, in particular at the 

signature motif of LGGXS within which the 4'-phosphopantetheine prosthetic group is 

covalently attached to the serine residue (Marahiel et al. (1997) Chem. Rev. 97: 2651-2673; 

Stachelhaus and Marahiel (1995) FEMS Microbiol Lett. 125: 3-14). While the current 

boundaries for a PCP domain in the literature were defined arbitrarily (Stachelhaus et al 
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(1996) Chem. Biol. 3: 913-921) and varied from one PCP to another, we can now re-define a 
PCP domain for the type I NRPS as a 90 amino acid peptide with approximately 45 amino 
acids, each flanking the essential serine residue in the LGGXS (SEQ ID NO:81) motif, in 
light of this discrete Blml type II PCP (Fig. 9). (2) The blml gene has been successfully 
5 expressed in E. coli, and fusion of a short peptide to the N-terminus of Blml dramatically 
improved its overproduction efficiency. While we cannot exclude the effect of different 
systems on gene expression, i.e., E. coli M15(pREP4)(pBSl) vs. E. coli BL21(DE-3)(pBS2), 
we attribute the increase in expression efficiency to the stability of Blml as an N-terminal 
fusion protein instead of the otherwise labile Blml protein with its native N-terminus. Since 
10 Blml was produced predominantly in the apo-form in E. coli, apo-Blml apparently was not a 
substrate for the endogenous PPTases, such as EntD or ACP synthase, excluding Blml as an 
ArCP or ACP, respectively. EntD and ACP synthase are known to 4'- 
? =3 phosphopantetheinylate apo-ArCP and ACP, respectively, to their holo-forms efficiently 

5 (Lambalot et al (1996) Chem. BioL 3: 923-936; Walsh et al (1997) Curr. Opin. Chem. Biol. 

H 15 1 : 309-3 15; Lambalot and Walsh (1995) J. Biol. Chem. 270: 24658-24661). (3) The apo- 

i% s.l 

: jS Blml protein serves as a substrate for PCP-specific PPTases that transfer the 4'- 

::?: phosphopantetheine moiety from CoA to apo-Blml to yield the holo-Blml protein. We have 

IU ■ 

demonstrated this posttranslational modification for Blml in vivo with the Gsp PPTase (Ku 
\1 et al (1997) Chem. BioL 4: 203-207) and in vitro with the Sfp PPTase (Gehring et al. (1998) 

fj 20 Biochemistry 37: 1 1637-1 1650; Lambalot et al (1996) Chem. BioL 3: 923-936; Quadri et al. 
O (1998) Biochemistry 37: 1585-1595), both of which have been extensively used in preparing 

holo-PCPs. (4) The specific modification of apo-Blml by 4'-phosphopantetheinylation has 
been monitored by HPLC analysis (Fig. 10) (Weinreb et ah (1998) Biochemistry 37: 1575- 
1584) and by specific incorporation of [3- 3 H]~P-alanine in vivo (Stachelhaus et al. (1996) 
25 Chem. Biol. 3: 913-921; Ku et al. (1997) Chem. BioL 4: 203-207; Epple et al. (1998) J. 
BacterioL 180: 4950-4954) and of [ 3 H-pantetheine]-CoA in vitro (Gehring et al. (1998) 
Biochemistry 37: 1 1637-1 1650; Lambalot et al (1996) Chem. BioL 3: 923-936; Quadri et al. 
(1998) Biochemistry 37: 1585-1595), respectively, into the 4' -phosphopantetheine moiety of 
the holo-Blml protein. The identity of Blml was finally confirmed by MALDI-Tof mass 
30 spectral analysis that determined the molecular weight for both the apo- and holo-Blml 
proteins. 

While individual domains of type I NRPSs can function independently and 
several A (Stachlhaus and Marahiel (1995) J. BioL Chem. 270: 6163-6169; Konz et al. 
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(1997) Chem. Biol. 4: 921-931; Weinreb et al (1998) Biochemistry 37: 1575-1584; Mootz 

and Marahiel (1997) J. BacterioL 179: 6843-6850) and PCP (Stachelhaus et al (1996) 

Chem. Biol 3: 913-921; Weinreb et al (1998) Biochemistry 37: 1575-15841; Pfeifer et al 

(1995) Biochemistry 34: 7450-7459; Haese et al (1994) J. Mol Biol 243: 1 16-122; 

Lambalot et al (1996) Chem. Biol 3: 923-936; Quadri et al (1998) Biochemistry 37: 1585- 

1595; Gehring et al (1996) Chem. Biol 4: 17-24; Ku et al (1997) Chem. Biol 4: 203-207) 

domains have been overproduced, purified, and biochemically characterized, aminoacylation 

in trans has been successful only between PCPs and their cognate A domains (Stachelhaus et 

al (1996) Chem. Biol 3: 913-921; Weinreb et al (1998) Biochemistry 37: 1575-1584). No 

aminoacylation between PCP and A domains from different NRPS modules has been 

observed. These results led to the conclusion that there is a specific protein-protein 

recognition between the A domain and its cognate PCP (Weinreb et al (1998) Biochemistry 

37: 1575-1584). Such domain-specific aminoacylation, in fact, should be beneficial in 

maintaining the fidelity of a type I NRPS by providing additional "gating" against 

misincorporation of non-specifically activated aminoacyl adenylate into the final peptide 

product. Since a type II PCP such as Blml lacks its cognate A domain, we asked if Blml 

could be aminoacylated by an unrelated A domain of a type I NRPS. Although we have yet 

to determine the biochemical role of Blml in vivo, the fact that the 6/m/gene is located in the 

middle of the blm gene cluster suggests that it may be involved in BLM biosynthesis. To 

avoid the ambiguity of selecting an A domain that may potentially interact with Blml in 

vivo, we preferred not to choose any A domain from the blm gene cluster to test if it could 

aminoacylate Blml in trans. We reasoned that an A domain that is unrelated to Blml should 

come from a gene cluster independent from BLM biosynthesis and should activate an amino 

acid not required by BLM. We chose Val-A because it satisfied both requirements. Val-A is 

an A domain of a type I NRPS from a gene cluster we have cloned previously from Sv 

ATCC 15003 that has proven to be unrelated to BLM biosynthesis, and it specifically 

activates valine among the 23 amino acids tested. Remarkably, Blml was efficiently 

aminoacylated by Val-A. The valine residue is specifically attached in a thioester linkage to 

the terminal -SH of the 4'-phosphopantetheine moiety of the holo-Blml protein, as evidenced 

by the fact that the apo-Blml was inactive under the identical conditions. 

Aminoacylation of holo-Blml by Val-A represents the first example in which 

an A domain aminoacylates a protein other than its cognate PCP domain. Since it has been 

suggested that an A domain of a type I NRPS can transfer the activated aminoacyl adenylate 

only to its cognate PCP domain because of the specific protein-protein recognition between 
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the two domains (Weinreb et al. (1998) Biochemistry 37: 1575-1584), the fact that Blml is 
aminoacylated by Val-A revealed a distinct feature of a type II PCP. It is very tempting to 
speculate that type II PCPs such as Blml may have broad intrinsic substrate specificity 
toward either the aminoacyl adenylate, the A domain, or both. In fact, the latter feature is 
reminiscent of the type II PKS ACPs, which have been shown to be interchangeable among 
different PKS complexes (Shen and Hutchinson (1993) Science 262: 1535-1540; Bao et al 
(1998) Biochemistry 37: 8132-8138; Carreras and Khosla (1998) Biochemistry 37: 2084- 
2088). The biosynthesis of £>-alanyl-lipoteichoic acid in Bacillus suntillis (Perego et al, 
(1995) J. Biol Chem. 270: 15598-15606) and Lactobacillus casei (Debabov et al (1996) 
178: 3869-3876) also involves a discrete ACP-like protein, the Z)-alanyl carrier protein, 
although the latter clearly is structurally and functionally different from PCPs. 

The results strongly suggest the existence of a type II NRPS. In fact, we have 
already identified within the blm gene cluster two additional genes, blmll and blmXI (Fig. 
IB), which encode type II C proteins based on sequence analysis (see Example 1). 

Significance. 

All NRPSs known to date are exclusively the type I modular enzymes that are 
multifunctional proteins consisting of domains, such as A (Stachlhaus and Marahiel (1995) J. 
Biol Chem. 270: 6163-6169), PCP (Stachelhaus et al (1996) Chem. Biol 3: 913-921), and C 
(Stachlhaus et al (1998) J. Biol Chem. 273: 22773-22781), for individual enzyme activities 
(Kleinkauf and von Dohren: H. (1996) Eur. J. Biochem. 236: 335-351; Marahiel et al (1997) 
Chem. Rev. 97: 2651-2673; von Dohren et al (1997) Chem. Rev. 97: 2675-2705), and 
control the structural variations of the resulting peptide products by the multiple-carrier 
thiotemplate mechanism (Cane et al (1998) Science 282: 63-68; Stein and Morris (1996) J. 
Biol Chem. 271: 15428-15435). While individual domains of type I NRPSs can function 
independently, aminoacylation in trans has been successful only between PCPs and their 
cognate A domains (Stachelhaus et al (1996) Chem. Biol 3: 913-921; Weinreb et al (1998) 
Biochemistry 37: 1575-1584). We have cloned and sequenced the blml gene, overproduced 
and characterized the Blml protein as a bona fide type II PCP, and demonstrated that the 
holo-Blml can be aminoacylated by a completely unrelated A domain. Our results provided 
for the first time the genetic and biochemical evidence to support the hypothesis of a type II 
NRPS, setting the stage for formulating new research concepts to study peptide biosynthesis. 
Genetic manipulation of type I NRPS has already been successful in generating novel 
peptides (Stachlhaus et al. (1995) Science 269: 69-72). An unprecedented type II NRPS 

63 




should shed new light in engineering NRPS proteins, greatly increasing our ability to access 
peptides with even greater structural diversities. 

Materials and methods 

General DNA manipulations 

5 Plasmids preparation and DNA extraction were carried out by using 

commercial kits (Qiagen, Santa Clarita, CA), and all other manipulations were carried out 
according to standard methods (Sambrook et ah (1989) Molecular cloning: a laboratory 
manual: (2nd ed): Cold Spring Harbor Laboratory Press: Cold Spring Harbor: USA). E. coli 
strain DH5a was used as the host for general DNA propagations. 

10 Overexpression ofblmlin E. coli and purification of the Blml protein 

The blml gene was amplified from Sv ATCC15003 by PCR using a forward 
primer of 5'-CCG C CC ATG G GT GCT CCG CGT GGC GAG CGG ACC CGG CGC-3' 
(SEQ ID NO:82, the Ncol site is underlined) and a reverse primer of 3'-CCT AGA TCT 
CCG GTC CCG CTC CCC CGT-5' (SEQ ID NO:83, the Bglll site is underlined). In order 

15 to create the Ncol site, the original starting sequence of "ATG AGC" has been changed to 
"ATG GGT", which resulted in the change of the second amino acid from serine to glycine. 
The first five codons of blml were also optimized for overexpression in E. coli. The PCR- 
amplified 0.3 kb Ncol-Bglll fragment was cloned into the similar sites of pQE-60 (Qiagen) 
to form pBSl. Digestion of pBSl with Ncol and Hindlll and cloning the resulting 0.3 kb 

20 Ncol-Hindlll fragment into the same sites of pET-29a (Novagen, Madison, WI) yielded 
pBS2. 

Expressions of blml in E. coli Ml 5 (pREP4)(pBSl) and in E. coli BL-21(DE- 
3)(pBS2) and purification of the resulting Blml protein by affinity chromatography on Ni- 

25 NTA resin were carried out under the standard conditions recommended by Qiagen and 
Novagen, respectively. The incubation temperature was lowered to 30 °C to improve the 
solubility. The purification of Blml was monitored by SDS-PAGE on 15% gel. The final 
pure Blml protein was desalted on PD-10 column (Sephadex G-25, Pharmacia Biotech, 
Piscataway, NJ) into 50 mM sodium phosphate buffer, pH 7.8, containing 200 mM NaCl, 10 

30 mM MgCl 2 , 2 mM dithiothreitol (DTT), 1 mM EDTA, 10% glycerol, and stored at - 80 °C 
for in vitro assays. 
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HPLC analysis and MALDI-Tof mass spectral determination 

Samples of Blml (30-70 |ag) purified from E. coli OG7001(pBS2) or E. coli 
OG7001(pBS2/pDPT-Gsp) were analyzed on a Nova-Pak CI 8 column (5mm x 10, Waters, 
Milford, MA) using a Rainin DMAX HPLC unit. The column was developed by a linear 
5 gradient of 0-50% acetonitrile in 0. 1% trifluoroacetic acid in 25 min, followed by additional 
5 min at 50 % acetonitrile, with a flow rate of 0.6 ml/min and detection at 280 nm. MALDI- 
Tof mass spectral determination was performed on a Bruker Biflex IIII spectrometer at the 
Facility for Advanced Instrumentation of University of California, Davis. 



BacterioL 1 80: 4950-4954) was transformed with pBS2 and cultured under the same 
conditions as for E. coli BL21(DE3) (Novagen). For co-expression of blml with gsp, pDPT- 
Gsp (Ku et al (1997) Chem. Biol. 4: 203-207) was similarly transformed into E. coli 
OG7001(pBS2) and the transformants were cultured in 2xYT (Debabov et al (1996) 178: 
15 3869-3876) in the presence of kanamycin (25 |ag/ml) and chloramphenicol (50 )ig/ml). For 
in vivo labeling experiment, cells from 2 ml overnight culture of either E. coli 
OG7001(pBS2) or E. coli OG7001(pBS2/pDPT-Gsp) were harvested, washed with M9 
minimal medium (Debabov et al (1996) 178: 3869-3876), and re-suspended in 2 ml of M9 
minimal medium. The latter were used as seed cultures (20 to inoculate 1 ml M9 



20 medium with kanamycin (25 jag/ml) or kanamycin (25 j^g/ml) and chloramphenicol (50 

jag/ml) for E. coli OG7001(pBS2) or E. coli OG7001(pBS2/pDPT-Gsp), respectively. The 
resulting culture was incubated at 30 °C, 250 rpm to OD 6 oonm 0.6 and to this was added 10 
jaCi of [3- 3 H]-P-alanine (50 Ci/mmol, American Radiolabeled Chmicals Inc., St. Louis, MO) 
with or without IPTG (1 mM). Total proteins were resolved by SDS-PAGE on 15% gels 

25 that were Coomassie blue-stained. To determine 3 H-labeling of the overproduced holo-Blml 
protein, gels were soaked in Amplifier (Amersham, Arlington Heights, II) for 20 min, dried 
between two sheets of cellulose membrane (KOH Development Inc., Ann Arbor, MI), and 
visualized by autoradiography on X-ray films (Fuji Medical Systems, Stamford, CT). 



In vivo labeling of Blml with [3- 3 H1-B-aIanine 



10 



The (3-alanine auxotroph E. coli strain OG7001 (Epple et al. (1998) J. 



30 



In vitro labeling of Blml with r 3 H-pantetheine1-CoA 

Expression of sfp in E. coli MV1 190(pUC8-Sfp), purification of the Sfp 
PPTase to homogeneity, and 4'-phosphopantetheinylation of apo-Blml by Sfp in vitro were 
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carried out essentially according to literature procedures (Quadri et ah (1998) Biochemistry 
37: 1585-1595; Nakano et ah (1992) Moh Gen. Genet. 232: 313-321). A typical 100 \x\ 
assay solution contained 26 jiM apo-Blml, 2.9 (iM Sfp, 25 jaM [ 3 H-pantetheine]-CoA (0.9 
jiCi, 40 Ci/mM), 10'mM MgCl 2 , and 5 mM DTT, in 75 mM MES/NaOAc buffer, pH 6.0. 
5 After 30 min incubation at 37 °C, the assays were stopped by addition of 5 \x\ of bovine 
serum albumin (0.2 mg/ml) and 0.9 ml of cold 10% (v/v) trichloroacetic acid (TCA). The 
precipitated proteins were collected by centrifugation at 14,000 rpm, 20 min, 4 °C 
(Eppendorf 5415C centrifuge), washed with 10% TCA three times, and resolved by SDS- 
PAGE on 15% gel. The 3 H-activity incorporated into holo-Blml was similarly determined 
10 by autoradiography as described for in vivo labeling of holo-Blm with [3- 3 H]-P-alanine. 

Overexpression of val-A in E. coli and purification and assay of the Val-A 
protein 

The val-A fragment was amplified from Sv ATCC 15003 by PCR using a 
forward primer of 5'-GGA ATT CCATATGGG CAC CAC CGT CGC CGC G-3 5 (SEQ ID 

1 5 NO:84, the Ndel site is underlined), and a reverse primer of 3 '-GGC AAG CTT GGG ACC 
GGG CGT GGA GCG C (SEQ ID NO:85, the Hindlll site is underlined). The PCR- 
amplified 1.6 kb Ndel-Hindlll fragment was cloned in the similar sites of pET-28a (Qiagen) 
to yield pBS3. Expression of val-A in E, coli BL-21(DE-3)(pBS3) and purification of the 
resulting Val-A protein by affinity chromatography on Ni-NTA resin were carried out under 

20 the standard conditions recommended by Novagen. 

Amino acid-dependent ATP-PPi assays were performed essentially according 
to the literature procedures (Ku et ah (1997) Chem. Bioh 4: 203-207; Lee and Lipmann 
(1970) Method Emzymoh 43: 585-602). A typical 100 \x\ assay solution contained 180 nM 
Val-A, 1 mM ATP, 0. 1 mM PPi with 0.2 \xC\ of 32 P-PPi (1 1.75 Ci/mmol, NEN Life Science 

25 Products, Inc., Boston, MA), 1 mM MgCb, 0.1 mM EDTA, and 1 mM Z-amino acid in 50 
mM sodium phosphate buffer, pH 7.8. After 30 min incubation at 30°C, the assays were 
stopped by addition of 0.9 ml of cold 1% (w/v) activated charcoal in 3% (v/v) perchloric 
acid. The precipitates were collected on glass fiber filters (2.4 cm, G-4, Fisher, Pittsburgh, 
PA), washed successively with 10 ml of 0.2 M sodium phosphate buffer, pH 8.0, 4 ml water, 

30 and 1 ml of ethanol, and dried in air. The filters were mixed with 7 ml of scintillation fluid 
(ScintiSafe Gel, Fisher) and counted on a Beckman LS-6800 scintillation counter to 
determine the radioactivity. 
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In vitro aminoacylation of holo-Blml by Val-A 

The aminoacylation of holo-Blml was carried out essentially according to 
literature methods (Stachelhaus et al. (1996) Chem. Biol. 3: 913-921; Weinreb et al. (1998) 
Biochemistry 37: 1575-1584). A typical 100 j^l assay solution contained 180 nM Val-A, 1.5- 
5 2.8 \jM apo- or holo-Blml, 35 jaM L-[ 14 C(U)]-valine (283 mCi/mmol, NEN Life Science 
Products, Inc., Boston, MA), 5 mM ATP, 10 mM MgCl 2 , and 5 mM DTT in 75 mM Tris- 
HC1 buffer, pH 8.0. The reactions were started by the addition of ATP and, after incubation 
at 37 °C for 30 min, were stopped by addition of 0.9 ml of cold 7% (v/v) TCA. The 
precipitated proteins were collected by centrifugation at 14,000 rpm, 20 min, 4 °C 
10 (Eppendorf 5415C centrifuge) and resolved by SDS-PAGE on a 15% gel. The radioactivity 
incorporated into the holo-Blml -Z,-[ 14 C(U)] valine species was similarly determined by 
autoradiography as described for in vivo labeling of holo-Blml with [3- 3 H]-P-alanine. 

^ Example 3: 

si Cloning and characterization of a phosphopantetheinyl transferase from the 

,.g 15 bleomycin-producing Strevtomyces verticillus ATCC15003 

Multienzymes complexes exist for acyl group activation and transfer reactions 
^ in the biogenesis of fatty acids, the polyketide family of natural products {e.g. erythromycin, 

U tetracycline), and almost all non-ribosomal peptides {e.g. vancomycin, cyclosporin, 

j ''i penicillin). All of these complexes contain one or more small proteins, -80-100 amino acids 

^3 20 long, either as separate subunits or as integrated domains, that function as carrier proteins for 
the growing acyl chain (acyl-, peptidyl-, and aryl- carrier proteins, abbreviated as ACP, PCP, 
and ArCP). They are converted from inactive apo-forms to functional holo-forms by the 
covalent attachment of the 4'-phosphopantetheine moiety of coenzyme A to a conserved 
serine residue of the carrier-protein substrate. This essential post-translational modification 
25 is catalyzed by a family of enzymes known as phosphopantetheinyl transferases (PPTases) 
(Lambalot et al. Chem. Biol. (1996) 3:923-936; Walsh et al. Curr. Opin. Chem. Biol. 
(1997) 1:309-315). 

Research in the field of polyketide and non-ribosomal peptide biosynthesis 
has been hampered by the inability to fully modify and thus convert to the active form some 
30 polyketide synthases (PKS) and polypeptide synthetases (NRPS) when overproduced in 
heterologous hosts, presumably because the host PPTases are unable to effectively modify 
these overexpressed protein substrates. Our group is currently involved in the 
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characterization of the gene cluster responsible for the biosynthesis of the antitumor drug 
bleomycin in Streptomyces verticillus ATCC 15003. As bleomycin synthetase is a hybrid 
NRPS/PKS enzyme, we decided to obtain a PPTase from the producing organism in order to 
use it in vitro or in vivo by coexpression with the synthetase genes to produce properly 
modified, active synthetases for our studies. 



Results and Discussion 



Cloning of the pttA gene from £ verticillus ATCC15003. 
\j 0$C^ The similarities among PPTases from different organisms are reduced to two 
shorfmotifs separated by 40-45 residues: (V/I)G(V/I)D, and (FAV)(S/C/T)XKE(A/S)hhK 
10 (Lambalot et al. ChemABiol. (1996) 3:923-936; Walsh etal. Curr. Opin. Chem. Biol. 

(1997) 1 :309-3 15). Our previous attempts to amplify PPTase sequences from S. verticillus 
chromosomal DNA using degenerate primers according to the two conserved motifs were 
unsuccessful (unpublished results), so we decided to narrow our target. PPTases have been 
classified in two groups, according to their specificity for the carrier-protein substrate: 
15 PPTases involved in polyketide/fatty acid biosynthesis use acyl carrier proteins (ACPs) as 
substrate, while those for non-ribosomal peptide biosynthesis use peptidyl carrier proteins 
(PCPs) or aryl carrier proteins (AfCPs) (Walsh et al. Curr. Opin. Chem. Biol. (1997) 
1:309-315). Several "NRPS-type" PPTase sequences were used to screen the databases to 
look for actinomycete homologues, and four proteins of unknown function were found: 
20 NshC from Streptomyces actuosus (Aet al. Gene (1990) 91:9-17), SC5A7. 23 from S. 
coelicolor (GenBank AL03 1 107), an unnamed protein from Streptomyces sp. strain TH1 
(Mori etal. J. BacterioL (1997) 179:5677-5683), and Rv2794c (later renamed PptT 
(Quadri et al. Chem. Biol (1998) 5:631-645)) from Mycobacterium tuberculosis (GenBank 
AL008967). The alignment of the actinomycete sequences showed the two motifs conserved 
25 in all PPTases and an additional motif - the "THC" motif: PXWPXGX 2 GS(M/L)THCXGY 
(SEQ ID NO:86), located about 15 amino acids upstream of the (V/I)G(V/I)D motif (SEQ ID 
NO:87). The "THC" motif is not universally conserved in all PPTases, but it can be detected 
also in some non-actinomycete PPTases like EntD (Coderre et al. J. Gen. Microbiol. 
(1989) 135:3043-3055). Using a recently developed method of PCR primer design (the 
30 CODEHOP strategy (COnsensus-DEgenerate Hybrid Oligonucleotide Primer) (Rose et al. 
Nucleic Acids Res. (1998) 26:1628-1635), two primers were designed around the typical C- 
terminal PPTase motif (primers KEA-1: 5'-T GCA GCA GAA CAG GAG GCKNYC CCA 
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NKG-3 ' (SEQ ID NO\88) and KEA-2: 5'-TG GGT CAG CGG GTA CCA NRC YTT RWA- 
3' (SEQ ID NO: 89, H^C+A, N=A+C+T+G, Y=C+T, K=G+T, R=A+G, W=T+A)), and one 
primer was designed from the "THC" motif (primer THC: 5'-C GGC ATG GTC GGC TCC 
HTN ACN CAY TG-3\ S^Q ID NO:90, H=C+A, N=A+C+T+G, Y=C+T, K=G+T, 
R=A+G, W=T+A); this motif is not universally conserved in PPTases of all organisms). 
Using S. verticillus chromosomal DNA as template, no amplification product was detected 
using the THC and the KEA-1 primers. The set of primers THC/KEA-2 successfully 
amplified a single band of the expected size (about 250 bp), which was gel-purified and 
cloned. Eight individual clones were sequenced, and all of them resulted to be identical 
(except differences due to primer utilization) and highly similar to the putative actinomycete 
PPTases. The PCR fragment was used as a probe to screen a S. verticillus genomic library 
by colony hybridization. Of the 10,000. colonies screened, 25 positive clones were 
identified, and then confirmed by Southern analysis to contain the same 4. 6-kb Bamftl 
hybridizing band. The 4. 6-kb DNA fragment was subcloned, and the nucleotide sequence 
15 of a 1,761 -bp BamRl-Sati region was determined (SEQ ID NO. 3). 

Sequence analysis of the pptA locus. 

^jyJ^^fo J^) The sequeii^e of the 1,761 -bp BamHI-Sall fragment was analyzed for coding 
regi9»5% using the CODOltf PREFERENCE and TESTCODE programs of the GCG 
package (Genetics Computer Group, Madison, Wisconsin). Two complete ORFs (pptA, 

20 or/3) and two incomplete ORFsYo///, orf4) were identified within the sequenced region 

(Figure 13). The first ORF from left to right (designated orfl) starts out of the analyzed area 
and ends with a TGA codon at position 248 of the sequenced fragment. Comparison of the 
deduced product of orfl with protein\ in databases showed similarities with Rv2795c from 
Mycobacterium tuberculosis (GenBanl^ AL008967) and SC5A7. 22 from S, coelicolor 

25 (GenBank AL03 1 107), both of unknowmfunction. The second ORF, pptA, contains the 
sequence amplified by PCR and used for the cloning of this locus. It comprises 741 
nucleotides, starting with a GTG codon (position 245) which is coupled to the stop codon of 
orfl, and ending with a TAA codon. The standing codon of pptA is preceded by a potential 
ribosomal binding site (RBS), GGGAG. The overall (76. 6%) and third codon position (93. 

30 9%) G+C contents and the codon usage of pptA arb similar to those found in other 

Streptomyces genes, with the exception of the stop cbdon (TAA), which is most uncommon 
in this group of organisms (Wright et al. Gene (1992)\l 13:55-65). The pptA gene encodes a 
protein of 246 amino acids with a predicted molecular m^ss of 25,619 Da and a pi of 4. 76, 
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which contains the conserved PPTase motifs. Databases searches with PptA showed 
significant similarities to\he putative actinomycete PPTases (39-52%/48-61% 
identity/similarity) and to confirmed bacterial PPTases such as EntD from E. coli 
(17%/24%identity/similariW (Lambalot et al. Chem. Biol (1996)3:923-936). The third 
ORF, or/3, is separated from pptA by an apparently noncoding DNA region of 153 bp, and it 
is transcribed in opposite and convergent direction with respect to orfJ-pptA. The gene or/3 
comprises 240 nucleotides, starting with an ATG codon (position 1358) and ending with 
TGA. The starting codon of or/3\s preceded by the sequence GAAGG, a potential RBS. 
The deduced product of or/3 enco^s a protein of 79 amino acids with a predicted mass of 
7,555 Da and a pi of 7. 17. The OrflB protein shows similarities to the N-terminal region of 
SC5H1 . 35c, a protein of unknown function from S. coelicolor (GenBank AL049863). 
Analysis of Orf3 with the SignalP program (Nielsen et al. Protein Engineer, (1997) 10:1-6) 
predicts an N-terminal signal peptide which would be cleaved between residues 27 and 28 
(ALA-DS), suggesting that the mature protein (52 amino acids, 5,099 Da, pi 4. 31) would be 
secreted. Between or/3 and orf4 there is an apparently noncoding region of 251 nucleotides. 
The orf4 gene is transcribed in opposite andVlivergent direction with respect to or/3. It starts 
with an ATG codon at ppsition 1610, preceded by a potential RBS (GGAGG), and ends out 
of the sequenced fragment. The deduced protelri product (50 amino acids) of the incomplete 
orf4 contains a potential NAD/FAD binding moW, GXGX 2 GX 3 GX 6 G (Scrutton et al. 
Nature (1990) 343:38-43), showing low similarities to diverse oxidoreductases. 

Heterologous expression and biochemical characterization of PptA, 

In order to test if pptA actually encodes a functional PPTase, we decided to 
overproduce and purify the PptA protein, and assay its catalytic competence on putative 
substrate proteins or domains. The pptA coding sequence was amplified by PCR and cloned 
into the T5-promoter-based pQE-70 vector, yielding plasmid pQEPPT, in such a way that a 
hexahistidine tag would be added at ths C-terminus of the protein. Expression of the 
pQEPPT construct in E. coli M15(pRHP4) resulted in the overproduction of soluble His- 
tagged PptA which was readily purified! by affinity chromatography on Ni-NTA agarose 
under non-denaturing conditions (FIGURE). Because pptA belongs, by sequence similarity, 
to the subfamily of PPTases involved inponribosomal peptide synthesis, we first assayed its 
activity using two different apo-PCPs aslprotein substrates. The first one, Blml, has been 
previously characterized in our laboratory as a discrete peptidyl carrier protein, or type II 
PCP, whose gene is found within the bleqmycin-biosynthesis gene cluster of S. verticillus 
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fusion, MBlmX-2, with a predicted mc 
pMAL 1 6 1 7 in E, coli TB 1 resulted in 
soluble, which was purified by affinity 
PPTase activity, we incubated the puri 
substrates in the presence of ( 3 H)-(pan :■ 



(Duetal. Chem. Biol (1999) 6:507-317). For the second PCP substrate we used BlmX, a 
bimodular NRPS protein encoded in the same cluster (Fig. 2), as a source of a type I PCP, i. 
e. a PCP included in a multidomain NI<PS. For the production of this type I PCP, we 
amplified by PCR a 1,898 bp fragment encoding the adenylation and PCP domains from the 
second module of BlmX. This DNA fiagment was cloned into pMAL-c2x to yield 
pMAL1617, in which the type I PCP would be produced as a maltose-binding protein (MBP) 

lecular mass of 108. 5 kDa. Introduction of 
good overproduction of MBlmX-2, about 40% 
chromatography using amylose resin. To test the 
10 PPTase activity, we incubated the puri Eied PptA with Blml and MBlmX-2 as putative protein 

:etheinyl)-CoASH, and the tritiated products were 
subjected to SDS electrophoresis and autoradiography. The well-characterized PPTase Sfp 
from B, subtilis, which exhibits a broad specificity for its protein substrate (Quadri et al. 
Biochemistry (1998) 37: 1585-1595), was included as a positive control. In these 
1 5 experiments PptA exhibited a robust p losphopantetheinylation activity on both Blml and 
MBlmX-2. Having demonstrated that D ptA does in fact have PPTase activity on both type I 
and type II PCP substrates from nonribDsomal peptide synthetases, we then proceeded to test 
two different acyl-carrier proteins (ACPs) as potential substrates. The first one, BlmVIII, is 
a monomodular multidomain polyketide synthase (PKS) which is encoded in the bleomycin- 
biosynthesis gene cluster of S. verticillus (Fig. 2). BlmVIII contains an ACP domain at its 
C-terminus, that is a type I ACP. For th s second ACP substrate we used TcmM, a type II 
acyl carrier protein involved in the biosynthesis of the aromatic polyketide tetracenomycin C 
in S. glaucescens (Shen et al. J. BacteAoL (1992) 174:3818-3821; Bao et al. Biochemistry 
(1998) 37: 8132-8138). For the production of TcmM, its coding sequence was transferred 
from a construct previously made in pET422b (Gehring et al. Chem, Biol, (1997) 4: 17-24) 
into the pET-28a vector to yield pET28a-TcmM, in such a way that a hexahistidine tag 
should be added at both the N-terminus arm the C-terminus of the protein. Plasmid pET28a- 
TcmM was introduced into E. coli BL21(i)E3), and TcmM was easily purified by affinity 
chromatography using Ni-NTA resin. In vitro phosphopantetheinylation assays were 
performed as before, but using BlmVIII ana TcmM as protein substrates, and PptA was able 
to posttranslationally modified both ACP substrates. 
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The pptA gene is not clustered to the bleomycin-biosynthesis locus. 

Some bacterial PPTase genes have been found clustered, or close, to their 
respective "partner" NRPS genes: entD {enterobactin (Coderre et al. J. Gen. Microbiol. 
(1989) 135:3043-3055)}, sfp {surfactin (Cosmina et al. Mol Microbiol (1993) 8:821- 
5 831)}, {gramicidin (Borchert et al. J. Bacteriol. (1994) 176:2458-2462)}, bli 

{bacitracin (Gaidenko et al. Biotechnologia (1992) 13-19)}, lpa-14 {iturin (Huang et al. J. 
Ferment. Bioeng. (1993) 76:445-450)}. To test the possible clustering of pptA to the 
bleomycin-biosynthesis (blm) locus, PCR reactions were performed using the THC/KEA-2 
primers on several overlapping cosmid clones spanning the blm locus plus 30-40 kb 
10 upstream and downstream of its putative limits. No amplification product could be obtained 
in these reactions, showing that the pptA gene is not clustered with the blm locus. 

Discussion 

It has been suggested that in organisms containing multiple 
phosphopantetheine-requiring pathways, each pathway has its own posttranslational 

15 modifying activity (Walsh et al. Curr. Opin. Chem. Biol (1997) 1:309-315). Our group 
has found that S. verticillus ATCC 15003 contains several PKS and NRPS gene clusters, one 
of them being responsible for bleomycin production (a hybrid NRPS/PKS system) (Shen et 
al. Bioorg. Chem. (1999) 27:155-171; Du et al. Chem. Biol. (1999) 6:507-517). This 
suggested that the gene encoding the PPTase for the BLM NRPS could be also clustered, or 

20 close, to the NRPS genes. However, we have not found this gene after sequencing almost 
the whole blm NRPS locus. Because having this gene could be important for us in order to 
express functional NRPS modules from the blm cluster, we decided to clone the PPTase 
gene. Additionally, if the "one NRPS cluster - one PPTase" hypothesis was true, it seemed 
possible to use PPTase sequences as a new kind of probe to clone novel NRPS clusters. 

25 We know that in S. verticillus there are several NRPS locus (maybe four), so 

we expected several "PCP-type" PPTases. However we have amplified only one, and it does 
not seem to be closely linked to any of the NRPS loci. Interestingly in the actinomycete 
Mycobacterium tuberculosis, whose genome is fully sequenced, there is only one PCP-type 
PPTase gene, which is not clustered with any of the two NRPS loci present in this organism 

30 (Quadri et al, Chem. Biol. (1998) 5:631-645). These and other indirect evidences suggest 

that the idea of cluster-specific PPTases is not the general rule at all but most probably the 

exception, especially in organisms containing multiple NRPS clusters. And there are strong 

evidences that at least some PCP-type PPTases can posttranslationally modify PCPs from 
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different clusters and even different organisms (Quadri et al, Chem. Biol. (1998) 5:631-645; 
Gehring et al, Biochemistry (1998) 37:1 1637-1 1650). It is most likely that there is only one 
PCP-type PPTase in S. verticillus and that its gene is not necessarily clustered to any of the 
NRPS loci. 

5 Biochemical characterization of the purified PptA protein confirmed not only 

its PPTase activity but also its broad specificity, comparable to that of Sfp. Different apo- 
PCPs (type I and type II) and a type-I apo-ACP from the bleomycin synthetase, and the type- 
II apo-ACP from the tetracenomycin PKS of Streptomyces glaucescens were efficiently used 
as substrates by PptA. These results suggest PptA as a good candidate for heterologous 
10 coexpression with NRPS and PKS genes to overproduce active holo-synthase enzymes. 

It is understood that the examples and embodiments described herein are for 
illustrative purposes only and that various modifications or changes in light thereof will be 
suggested to persons skilled in the art and are to be included within the spirit and purview of 
15 this application and scope of the appended claims. All publications, patents, and patent 
applications cited herein are hereby incorporated by reference in their entirety for all 
purposes. 
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